What is the Dual Write Problem? | Designing Event-Driven Microservices

Поділитися
Вставка
  • Опубліковано 28 тра 2024
  • ► LEARN MORE: cnfl.io/microservices-101-mod...
    The dual write problem occurs when you try to write to two separate systems and need them to be atomic. If one write fails, and the other succeeds, you can end up with inconsistent state. This is an easy trap to fall into, and it can be difficult to avoid. We'll explore what causes the dual-write problem and explore both valid and invalid solutions to it.
    Check out the Designing Event-Driven Microservices course on Confluent Developer for more details: cnfl.io/microservices-101-mod...
    RELATED RESOURCES
    ► What is the Transactional Outbox Problem?: • What is the Transactio...
    ► What is the Event Sourcing Pattern?: • What is the Event Sour...
    ► What is the Listen to Yourself Pattern?: • What is the Listen to ...
    ► Eliminating the Double Write Problem in Apache Kafka Using the Outbox Pattern: cnfl.io/3UhVbVC
    ► Microservices: An Introduction cnfl.io/3ZMt3up
    ► Event-Driven Microservices Architecture: cnfl.io/48FSYbj
    ► Migrate from Monoliths to Event-Driven Microservices: cnfl.io/3tsqlhu
    ► Get Started on Confluent Developer: cnfl.io/48FnKRB
    CHAPTERS
    00:00 - Intro
    00:23 - How to emit events in an Event-Driven Architecture
    00:55 - What is the Dual Write Problem?
    02:19 - Can you avoid the Dual Write problem by emitting the event first?
    02:35 - Can a transaction help avoid the Dual Write problem?
    03:54 - What is the Change Data Capture (CDC) pattern?
    04:12 - What is the Transactional Outbox pattern?
    04:29 - What is the Event Sourcing pattern?
    04:42 - What is the Listen to Yourself pattern?
    05:43 - Closing
    --
    ABOUT CONFLUENT
    Confluent is pioneering a fundamentally new category of data infrastructure focused on data in motion. Confluent’s cloud-native offering is the foundational platform for data in motion - designed to be the intelligent connective tissue enabling real-time data, from multiple sources, to constantly stream across the organization. With Confluent, organizations can meet the new business imperative of delivering rich, digital front-end customer experiences and transitioning to sophisticated, real-time, software-driven backend operations. To learn more, please visit www.confluent.io.
    #confluent #apachekafka #kafka
  • Наука та технологія

КОМЕНТАРІ • 107

  • @ConfluentDevXTeam
    @ConfluentDevXTeam 3 місяці тому +6

    Wade here. In case you missed it, this video is part of the Microservices 101 course. You can find more videos on microservices, distributed systems, and other related topics in this playlist: ua-cam.com/video/UStWv62FX-k/v-deo.html

  • @nitingaur1707
    @nitingaur1707 3 місяці тому +16

    Dual Write problem is part of larger distributed transaction handling which is a very complex process with a lot of moving parts. In microservices architecture, where services operate independent of each other, distributed transaction management is replaced by series of local transactions known as saga pattern.
    In this video, presenter is illustrating issue when implementing saga with choreography. The presenter did not mention about 2PC (Two Phase Commit) which is an alternate option. It is less preferred and viable because it requires XA compliant transactions resources to participate. The patterns mentioned in the video favours eventual consistency approach.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +4

      Wade here. The context of the video is focused on cases where 2PC is not an option. If 2PC is available, then that's absolutely an option, although it has its own set of advantages and disadvantages.

  • @ConfluentDevXTeam
    @ConfluentDevXTeam 3 місяці тому +20

    Wade here. Just sharing my own experience with this problem. In my early days of building event-driven software, I would take the approach of ignoring the dual-write problem. We tried all sorts of things to make it not matter. However, we often found ourselves in situations where events were missing, or we had extra events that didn't make sense. We'd try going back through the records to understand how that happened and usually ended up with no logical explanation. While I can't prove today that this was the dual write problem, it seems like a pretty logical candidate.

    • @cryingwater
      @cryingwater 3 місяці тому +2

      Every developer's worst nightmare is a bug that happens in a blue moon

  • @akashagarwal6390
    @akashagarwal6390 3 місяці тому +4

    best concise, crisp, to-the-point video on this topic till date ever by anyone.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. Glad you enjoyed the video. Keep watching the channel for more like it.

  • @jewpcabra666
    @jewpcabra666 3 місяці тому +9

    have definitely been working through this problem but never had the right name for it. Thankfully got to one of the approaches for solving it - but its definitely a hairy problem that requires forethought!

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +3

      Wade here. Agreed. I think part of the issue is that it isn't something that everyone knows the name of. If it was part of our lingo, we wouldn't get bit by it as often as we do. Hopefully, that's where videos like this can help.

  • @gimmebytes
    @gimmebytes 2 місяці тому

    When engineering teams start saying "We assume that...", than this often a sign of not knowing the fallacies of distributed systems and hence, building software that contains the mentioned flaws. Nice and crisp summary!

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 2 місяці тому

      Wade here. Glad you enjoyed it. Unfortunately, when it comes to the Dual Write Problem, you are probably right. I think a lot of people simply aren't aware that it is a problem and why. Reasoning through the various edge cases and race conditions involved with distributed systems is hard. It takes a measure of practice to get used to it.

  • @rubenvervaeke1635
    @rubenvervaeke1635 3 місяці тому +11

    A difficult problem, which I find difficult to explain to collegues. Thank you for such a clear and instructive explanation!

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. I'm glad you enjoyed the video. It is a tricky one to explain. If someone hasn't been bitten by it, then it can be tough to understand the consequences. And it always seems like if you just tweak this one little thing (i.e. transactions, or moving where you emit), then you can make it not matter, but unfortunately, it's not that simple. I'd love to know if you have had success with explaining it in the past. If not, hopefully, this video can help provide an explainer in the future.

    • @rubenvervaeke1635
      @rubenvervaeke1635 3 місяці тому +1

      @@ConfluentDevXTeam I'd like to believe I had success explaining it to collegues. I also used a lot of visual diagrams, which pinpoint almost every potential failure occurence (inter-process failure, message publication failure, message acknowledgement failure, ... ). It helped a lot in conveying why it is impossible to implement an atomic operation spanning multiple different processes.

  • @Lexaire
    @Lexaire 3 місяці тому +1

    Great video and really appreciate your discussions in the comments!

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +1

      Wade here. I'm glad you noticed the discussion in the comments. I do my best to provide something meaningful where possible. That's not always easy, but I figure if people are taking the time to comment, it's the least I can do.

  • @zxborg9681
    @zxborg9681 3 місяці тому +2

    I've never written a microservice (whatever that even is). But having done hardware design and embedded realtime software for the last 30 years, this stuff is just another spin on the same universal problems. Very familiar in general terms.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. Yeah, it's not really a "microservices" problem. It's a distributed systems problem. But even then, how we define a "distributed system" is somewhat loose. This problem can occur even in situations where the "distributed" label wouldn't normally be applied. For example, if you had an application writing to two different files on a single piece of hardware, this problem could still exist.

  • @9akashnp8
    @9akashnp8 3 місяці тому +1

    Learnt something new, thank you!

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. Glad to hear it. If you've learned something watching the video, then I've done my job.

  • @kevinding0218
    @kevinding0218 24 дні тому

    3:41 certainly could be worth exploring further as a separate topic! Especially if instances of derived events being mishandled, overlooked, or redirected to DQL occur, as this could reintroduce the dual write problem.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 22 дні тому +1

      Wade here. You are right, exploring deduplication of events would definitely be an interesting topic for further exploration. I'll make a note of that.

    • @kevinding0218
      @kevinding0218 20 днів тому

      @@ConfluentDevXTeam Thanks Wade and really appreciate that! In addition I'm more interested in exploring a graceful failure management in event-driven architecture but I couldn't find a convincing solution or resource.
      Typically, retries are the first solution considered, but they can lead to problems, such as with Kafka, where continually retrying an event on the same topic may block offset progression and cause consumer lag. Redirecting events to a retry topic could also present resilience issues, like potential failures in posting the retry event. (Unfortunately, unlike a message queue that has retry designed by natural, the offset progression mechanism in Kafka introduced pros and cons here.)
      Often, after multiple retries, events are moved to a Dead Letter Queue (DLQ) or tracked via metrics. However, I've noticed that events in the DLQ tend to be neglected indefinitely, which developers often ignore at first place. This can lead to unrecognized issues that become increasingly difficult to address over time.

    • @kevinding0218
      @kevinding0218 20 днів тому

      ​@@ConfluentDevXTeam Thanks Wade and really appreciate that! In addition I'm more interested in exploring a graceful failure management in event-driven architecture but I couldn't find a convincing solution or resource.
      Typically, retries are the first solution considered, but they can lead to problems, such as with Kafka, where continually retrying an event on the same topic may block offset progression and cause consumer lag. Redirecting events to a retry topic could also present resilience issues, like potential failures in posting the retry event. (Unfortunately, unlike a message queue that has retry designed by natural, the offset progression mechanism in Kafka introduced pros and cons here.)
      Often, after multiple retries, events are moved to a Dead Letter Queue (DLQ) or tracked via metrics. However, I've noticed that events in the DLQ tend to be neglected indefinitely, which developers often ignore at first place. This can lead to unrecognized issues that become increasingly difficult to address over time.

    • @kevinding0218
      @kevinding0218 20 днів тому

      One practical application we've observed involves using CDC to synchronize data from a NoSQL database to Elasticsearch. If an operation like a deletion fails and isn't addressed further after entering DLQ, it leads to discrepancies between the two data sources. This situation becomes difficult to rectify without performing another full synchronization.

  • @VincentJenks
    @VincentJenks 3 місяці тому

    Great vid! We ran into all the same issues in 2017 when designing and architecting my first microservices platform. We used Kafka with Node and the Listen to Yourself pattern, and we were quite successful with it! It seems like the path of least resistance and complexity.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +1

      Wade here. The Listen to Yourself pattern is definitely a great option for solving this problem. It can be very effective if your use case can tolerate the built-in eventual consistency.

  • @JakeBerg777
    @JakeBerg777 3 місяці тому +3

    This was great 👍🏻 thank you!

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +1

      Wade here. Glad you enjoyed it. Keep an eye on the channel for more like it.

  • @new_wo_rld
    @new_wo_rld 3 місяці тому

    Concise and very useful

  • @omernacisoydemir
    @omernacisoydemir 3 місяці тому +4

    This is a very descriptive video. I would like to have a question about the message relay layer where the outbox table is read and events are forwarded. I had implemented The Polling publisher pattern specific to my needs. What are the disadvantages when we compare with CDC(Debezium)?

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. Thanks for the feedback. I'm glad you enjoyed the video. At a basic level, I will say the advantage of a CDC solution is that a lot of the logic is automated. Basically, you have to write less code. If you want more detail, you might consider dropping into our community channels and posing your question there. www.confluent.io/community/

    • @LewisCowles
      @LewisCowles 2 місяці тому

      @@ConfluentDevXTeam wont that also mean failures are hidden? We were excited about CDC and then I spoke to a trusted ex colleague and they said. Kiss goodbye to domain events if change / CRUD events are your source. I make them right as not all systems support CDC, it also makes holes in the system; which likely looks green at all times.

  • @mitchellyuen7961
    @mitchellyuen7961 2 місяці тому

    This is amazing 🙌

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 2 місяці тому

      Wade here. Thanks for the feedback. Glad you enjoyed it.

  • @mitchross2852
    @mitchross2852 3 місяці тому +1

    Great video

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. Thanks. Glad to hear you enjoyed it.

  • @kaushikmitra28
    @kaushikmitra28 3 місяці тому +2

    Hi, what if we enable transactions in the kafka producer and defer the e-mail logic to a kafka consumer which is set to read_committed isolation level?

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +2

      Wade here: Unfortunately, Kafka transactions don't solve this problem. Remember, your Kafka transaction and your DB transaction aren't linked. You still have a potential point in your code where maybe you have committed your DB transaction, but haven't committed your Kafka transaction, or vice versa. If you get a failure at that point, one thing is committed, and the other is not, so you still have the dual write problem. Now, on the flip side, Kafka transactions might be really helpful when consuming an outbox or doing event sourcing to ensure that you don't get duplicate messages in Kafka.

  • @warpmonkey
    @warpmonkey 2 місяці тому

    Great video. Something though is wrong with the levels on your mic, it's crackling on louder sounds much more than it should.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 2 місяці тому

      Wade here. Glad you enjoyed the video, minus the mic issues. I have a few videos where the mic seems to be turned up too high causing the audio to peak. I think I have fixed it now, but unfortunately, this must have been one of the videos where the problem was present.

  • @RicardoSilvaTripcall
    @RicardoSilvaTripcall 3 місяці тому +1

    Can we use the SAGA pattern in this case? If something goes wrong, we prepare compensations and execute them on all previous and executed steps int the workflow ...

    • @profitableloser3879
      @profitableloser3879 3 місяці тому

      SAGA Pattern can be used for small scale systems, SAGA Patterns are difficult to scale. If you are working with large scale app then you would want to opt for the patterns explained in the video.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +2

      Wade here. In some cases, the Saga pattern could be a valid solution. However, in others, it might not work. Much like with any of the solutions presented, there are cases where it's a good idea and cases where it isn't.

  • @vanthel
    @vanthel 3 місяці тому

    Is there any issue with CDC that tails replication logs on say Postgres, how do you failovers in a way that gives the best availability of the CDC processor whilst replication slots are stopped and started?

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. Databases such as Postgres favor the Consistency side of the CAP theorem. As a result, Availability takes a back seat. As a result, there is always going to be potential availability issues. You might check with the documentation for Postgres CDC or try a Postgres user forum for specific advice on how to tune the CDC processor for this type of use case.

  • @dimitrikalinin3301
    @dimitrikalinin3301 3 місяці тому +3

    Brilliant! Thanks a lot! Based on my experience, it's much harder to explain it to the stakeholders to get additional resources for the proper implementation. Especially in the cloud environment. Components are reliable - they say... It will never happen - they say...

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +1

      Wade here. That's part of the issue with the dual write problem. It tends to be easy to ignore and pretend it won't be a problem. As with many software problems, until you have been bitten by it, it can be hard to really think of it as a problem. One of the keys, in my opinion, is to focus on the consequences. Let's assume the dual write problem will happen, even if it's rare. When it does happen, what are the consequences? If you can live with them, then maybe you don't need a solution. But if you can't live with them, then you better implement a solution.

  • @dmitriizheleznikov2949
    @dmitriizheleznikov2949 2 місяці тому

    thanks

  • @ThePlaygroundFire
    @ThePlaygroundFire 3 місяці тому

    Regarding 2:36, what about using Apache Camel? In that scenario, you have a transacted route that writes to the database and then emits an event to a Kafka topic? That would ensure both events are wrapped in a transaction.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. I can't really speak specifically for Apache Camel. If it separates the database writes and the Kafka writes into separate processes, then it might be a valid solution. Essentially, it would be implementing something similar to the solutions I describe at the end of the video. If it tries to do both at the same time, then it's unlikely to work.

    • @logiciananimal
      @logiciananimal 2 місяці тому

      @@ConfluentDevXTeam Seems similar to Microsoft's DTC which at least as advertised allows cross-process transactions (e.g., with WCF). (I have tried to use it long ago and failed, so am unsure.)

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 2 місяці тому

      @@logiciananimal Wade here. I'm not deeply familiar with DTC, but a transaction coordinator can potentially be used to solve the dual-write problem, but for it to work, it can't operate purely in memory. It's going to need persistence mechanisms. And quite often, the transaction coordinator is going to be based on something like the Listen to Yourself Pattern, or Event Sourcing, etc. I.E. It's going to write some kind of event, and then use that to trigger logic updates to other sources. That way, if a failure occurs, it can retry. The key, as outlined in the video, is to separate the process into an initial write, and then one or more secondary writes that depend on the first.

  • @IIIxwaveIII
    @IIIxwaveIII 3 місяці тому

    many messaging systems are XA complient and as such supports transactions that span multiple systems. Kafka as far as i know does not support XA but many others do...

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +3

      Wade here. This video is definitely focused on cases where you can't rely on things such as Two-Phase Commit. If those techniques are available, then they can certainly be used. Having said that, those techniques come with their own advantages and disadvantages. So it comes down to evaluating what your needs are against the solution.

  • @alhassanreda2502
    @alhassanreda2502 3 місяці тому +2

    why there are no solution that confirm status. so added the status in db and after even confirm the status .

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. I'm not sure if I fully understand what you are asking, so let me see if I can clarify. I think you are asking whether you could solve this by recording the status of the operations in the database. For example, you write a record to the database with a status flag, write to Kafka, and then update the status flag to indicate that the Kafka send was completed successfully. Assuming that is what you are asking, it is a potential solution, if done a certain way. If you try to do it in the same process, you are back to the original dual write problem. What if the Kafka write succeeds, but the status write fails? On the other hand, if you break it into separate processes, then it could work. The first process updates the DB and sets the status flag indicating that it hasn't been published to Kafka. An async process reads the DB looking for those status flags. It then writes the appropriate records to Kafka and updates the status. If the status update fails in this case, you try again which could result in duplicate Kafka records.
      One of the challenges to this approach is deciding what you need to write to Kafka. How do you translate the current state of the DB into an Event to be sent via Kafka? And what happens if there are multiple updates in a short period? Will you potentially lose previous events? Is that okay? That's why this type of approach would often be combined with Event Sourcing or the Transactional Outbox pattern to ensure that events are never missed.

    • @alhassanreda2502
      @alhassanreda2502 3 місяці тому

      @ConfluentDevXTeam Wade, you are smart and your answer is fully correct. However, my solution looks more like a blockchain of statuses. Here's how it works:
      1. Status Writing in Queue: Initially, a status is written in a queue, for example, OP1.
      2. Event Sent to Kafka: When Kafka receives the event, it adds a new status to the queue status, turning it into OP1_R1.
      3. Service Processing: After the service completes its work, the status is updated to reflect the outcome: in the case of a failure, it becomes OP1_R1_F1, and in the case of success, OP1_R1_S1.
      4. Status Display: When needing to show the user the status of the request, we convey success by showing OP1_R1_S1.
      Solution Implementation: This solution can be implemented using queue status in a distributed cache, trace, or event log.
      Final Status and Database Update: The final status can be taken from the queue and saved in the database. Even if this update fails, you can retry from the queue again.
      This approach, akin to a "blockchain of statuses," ensures that every status update is traceable and recoverable, addressing the concern of confirming status updates and overcoming potential failures in updating the database. By leveraging a distributed queue and Kafka, we can ensure that the status is both distributed and consistent across the system, allowing for retries in case of failure and ensuring no status confirmation is lost.
      I understand the concerns you raised regarding the potential for losing previous events with rapid updates, and this is where the design of the event log and the distributed cache plays a crucial role. By carefully managing the queue and ensuring that each status update is atomic and traceable, we mitigate the risk of lost updates. This approach offers resilience and reliability in tracking the status of each operation within the system, aligning with the principles of Event Sourcing and the Transactional Outbox pattern to ensure comprehensive event handling and status tracking.
      Thank you for sparking this engaging discussion, Wade. Your insights are invaluable, and they've helped clarify the robustness and potential of this solution in addressing the dual write problem and ensuring consistent status updates.

  • @asian1nvasion
    @asian1nvasion 3 місяці тому

    Hi, im a little confused by the labelling of the diagrams here. Is the command handler supposed to be the microservice ? I noticed the label was changed mid-way through the video (3:24) which added to the confusion. If command handler were a microservice, are you saying that if one operation were to fail, the second one may not be executed and leads to inconsistencies? So instead, the solution you showed is to create another microservice called AsyncProcess, and its sole purpose is to periodically check the state of the first operation has completed successfully (like with a cron job)???
    Dump question, if the database write command was handled and rolled back, and the command is wrapped in exception handling, then there technically shouldnt be an event emitted, and would render your solution redundant. Am i missing something?

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. The assumption is that the command handler exists inside of a microservice somewhere. However, the reality is that it could just as easily be inside of a monolith. It is not, by itself, a microservice.
      The async processor that is added later could exist as a separate microservice, but again, it doesn't have to. It could just be another thread inside of the original microservice. However, a common way to implement it (as shown in the video) is using something like a CDC process in which case it would be separate.

  • @paulfunigga
    @paulfunigga 2 місяці тому

    Great video, thanks

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 2 місяці тому

      Wade here. You are welcome. Glad you enjoyed it.

  • @sarwanhakm7517
    @sarwanhakm7517 3 місяці тому

    Wade don't you think independent should be used instead of "dependent"? when mentioning that the two writes should be separated since the scanner simply checks for new events periodically and is independent of the state writer and doesn't really case about what's being emitted.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. I think you certainly could make that argument. However, there is a dependency, regardless. If the first write doesn't happen, then the second write can't happen. One depends on the other.

  • @johncerpa3782
    @johncerpa3782 3 місяці тому

    Great video. Learned a lot

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +1

      Wade here. Glad to hear you learned something new, and thanks for the feedback.

  • @avalagum7957
    @avalagum7957 3 місяці тому

    Great! Then how does the Change Data Capture work? If the CDC reads data from a DB and publishes events, how does it know which data it already read?
    Edit: if this problem can be solved completely without using a distributed transaction, does that mean that every problem that needs a distributed transaction can be solved in a different way?

    • @dandogamer
      @dandogamer 3 місяці тому

      Databases have hooks that can be used I.e AfterInsert hook will trigger when a new row is added.

    • @avalagum7957
      @avalagum7957 3 місяці тому

      @@dandogamer are you sure that it is how CDC works? If yes, then CDC can only detect rows added into a table after CDC is deployed?

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +2

      ​@@avalagum7957 Wade here. CDC solutions often have the option to send the existing state as well. So if you enable CDC after data has already been written to the table, then it can basically send the current state of the table, followed by any deltas. This does mean that you don't get the full history of events unless the table you are working from is Event Sourced or an Outbox table where you never deleted records. However, ideally, you'd solve the Dual Write problem on Day 1. In that case, CDC would be turned on from the very beginning, and you would get every change. This would only become an issue if you were retroactively adding CDC into the system.
      To answer your second question, I wouldn't necessarily go as far as to say that every distributed transaction problem can be solved without a distributed transaction. That can depend a lot on the business requirements. However, I would say that many distributed transactions can be replaced by other solutions.
      Distributed transactions are powerful, but they introduce a lot of their own complexities such as blocking operations. They require everything to be operational all of the time. One of the nice things about event-driven systems is that you can build them to allow portions of the system to be offline which leads to some interesting features. This can lead to more robust systems that can continue to operate, even during failure scenarios.

  • @loflog
    @loflog 26 днів тому

    Is DDB Streams CDC or transactional outbox?

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 26 днів тому

      Wade here. I'm afraid I can't answer that question directly. I will say that it's possible for something to be both. I.E. Using CDC to emit events from a Transactional Outbox is a completely valid approach. You might consider taking a look at my video on the Transactional Outbox Pattern. It might help answer your question. ua-cam.com/video/5YLpjPmsPCA/v-deo.html.

  • @xxXAsuraXxx
    @xxXAsuraXxx 3 місяці тому

    is this where outbox pattern can help?

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +2

      Wade here. The Transactional Outbox pattern is definitely one of the potential solutions to the problem, as shown in the video.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +3

      Wade here again. By the way, if the Transactional Outbox pattern is of interest to you, stay tuned. We'll be releasing a new video shortly covering the topic in more detail.

  • @TheLSS011
    @TheLSS011 2 місяці тому

    I would solve this problem by making the transaction idempotent and relying on Kafka's dead letters replaying mechanism. When dealing with asynchronous environments, it's crucial to make everything idempotent, specially database transactions.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 2 місяці тому +1

      Wade here. Idempotency is definitely a good idea in these types of systems. When the source is Kafka, and the destination is Kafka, you can use idempotent database writes and keep redoing them until everything (including the downstream publish) succeeds. But for that to work, your upstream message has to be coming from something Kafka so that we can leverage its retry mechanisms. But what if it isn't? What if it comes from a REST call, or some other mechanism? Even if your upstream message comes from Kafka, how did that message get into Kafka in the first place? Did it have to deal with the Dual-Write problem? Basically, by relying on Kafka to do retries, you don't avoid the dual-write problem, you just push it upstream.

    • @TheLSS011
      @TheLSS011 2 місяці тому +1

      ​@@ConfluentDevXTeam thanks Wade! Very relevant counterpoints! I going to study about the patterns you described. You video is excellent, by the way. I've just subscribed to your channel.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 2 місяці тому

      @@TheLSS011 Wade here. Glad you enjoyed the video. And thanks for hitting the subscribe button. Hopefully, you get even more value from some of the other videos on the channel.

  • @rustyprogrammer1827
    @rustyprogrammer1827 3 місяці тому

    but why we cant emit event after the state is saved ?. I mean we are dealing with a single microservice ,can anyone please explain?

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +1

      Wade here. It's a single microservice, but it's writing to two different external storage solutions (Database + Kafka). As outlined in the video, the problem comes if you write the state to the database, and before you can emit the events, the microservice fails. It doesn't even need to be a big failure. A network problem that prevents the data from being written to Kafka would be sufficient. At that point, your two external storage solutions have conflicting data. The database thinks the action was completed. Kafka isn't aware of it. You've introduced an inconsistency.

    • @mateuszlasota3226
      @mateuszlasota3226 3 місяці тому

      ​@@ConfluentDevXTeam Awesome example, I was thinking about "why he doesn't use transaction to execute business logic and after success it, emit event" about it while I was watching your video, thanks for the clarification
      That edge case prevent to Partition Intolerance from CAP theorem am I right?
      Do you plan make video about CAP theorem with real life example?

  • @deadlyecho
    @deadlyecho 3 місяці тому

    Was just coding a side project with a redis cache and postgres and ran into similar issue

    • @flybyray
      @flybyray 3 місяці тому

      Why it was important to mention redis cache and postgres? Nice you run into similar issue. Do you want to talk about it?

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. Yes, this problem is definitely not limited to specific technologies such as Apache Kafka. It can happen any time you write to two separate locations. So Redis and Postgres could absolutely encounter this problem.

    • @flybyray
      @flybyray 2 місяці тому

      @@ConfluentDevXTeam for me it is surprising that new wordings like „dual write“ arise for things which were discussed in different context already several decades ago. In the context of the dual write problem, a Petri net could be used to design a pattern where concurrent write operations are properly synchronized to avoid conflicts. This could involve, for example, introducing places and transitions that represent locks or other synchronization mechanisms. I am ok with the new wordings if those would help to tackle the problems.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 2 місяці тому +2

      @@flybyray Wade here. The term dual write is hardly new. It's been around for several years at least. What I like about it is that it describes the problem rather than being a buzzword. The problem may have been discussed for a long time, but unfortunately, it's not as well understood as we might like, especially among new developers. As systems become more distributed, it starts to arise a lot more than it did in the earlier days, which makes it critical to raise awareness. Unfortunately, just saying we could use a petri net to design a pattern to solve the problem isn't really a solution. It doesn't actually explain how to solve the problem. Developers want concrete solutions such as the Transactional Outbox, Event Sourcing, or the Listen to Yourself Pattern. These are concrete ways to solve the problem with clearly defined mechanics.

  • @krishan7976
    @krishan7976 3 місяці тому

    Can't we just do this in the microservice where we only emit the event when changes are successfully committed to database and handle emit failure via some feedback loop

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +1

      Wade here. Unfortunately, no. When we talk about failure, we aren't just talking about getting an exception or error message. We are also talking about absolute failures such as application termination, hardware failures, etc. In these cases, there is no "feedback loop". The application was terminated. And when it comes back, you've lost the event. Solving it would require persisting the event before the termination, which brings us back to the potential solutions such as Event Sourcing, the Outbox Pattern, etc.

  • @jpkontreras
    @jpkontreras 3 місяці тому

    I immediately thought in a WAL.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. I mean, a Write-Ahead Log is basically just Event Sourcing. So yes, that's a possible solution.

  • @ozlemelih
    @ozlemelih 3 місяці тому +2

    Write a monolitj

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +1

      Wade here. Unfortunately, a monolith isn't a solution. You can encounter this problem in a monolith just as easily as you can with Microservices. For example, if your monolith needs to write to the database and send an email, you've just encountered the Dual-Write problem. Or if your monolith wants to emit to write to a database and Kafka. Or if your monolith needs to write to two separate databases. There are many ways this problem can occur because the reality is this isn't a microservices problem, it's a distributed systems problem.

  • @ArtemisChaitidis
    @ArtemisChaitidis 3 місяці тому

    Here we see segregation of responsibilities to achieve that.
    Although this is achievable too within the same microservice by making sure you have a record in the database that the event has been sent to downstream after you have emit it.
    In case of failure, the event might be duply emited but as you said the downstream system checks can handle it.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому

      Wade here. A separate microservice is definitely not a requirement here. What is required is a separate process or thread. If you record the event in the database, and then have a separate process/thread to ensure it gets emitted downstream (and make note of that), then you are implementing a solution such as the Transactional Outbox Pattern or Event Sourcing.

  • @mohsan5491
    @mohsan5491 3 місяці тому

    Idempotency: Idempotent Writer

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +1

      Wade here. Idempotency is potentially quite valuable in the consumers of the Kafka messages once they get there. However, it doesn't do much for solving the dual write problem directly.

  • @xcoder1122
    @xcoder1122 3 місяці тому +1

    I'm no microservices expert, but if that's a problem to microservices, then microservices suck. On our server, no event would be triggered unless the database write was successful, so it can never happen, that we send out an event and the change is not reflected in the database. Nothing is easier than ensuring that a certain piece of code is executed after a database write succeeded and only in case it ever did succeed. And in case the event cannot be emitted, well, that's the same issue as shown at 3:40. Then we can also re-try at at a later time. Of course that may fail again and it may fail forever, but the video doesn't provide a solution for this problem either.

    • @ConfluentDevXTeam
      @ConfluentDevXTeam 3 місяці тому +3

      Wade here. This is not a problem with microservices. It is a problem with distributed systems. This can also happen in a monolith that happens to use a DB + Kafka. Or even a monolith that uses a DB + sends an email, or a variety of other scenarios.
      You are saying that it is easy to guarantee that a piece of code (say emitting an event) is only executed after the DB has committed. That's absolutely true. That part is easy. What is hard is guaranteeing that both succeed or both fail. I.E. Guaranteeing that because the DB succeeded, the event being emitted is also going to succeed. If you implement retry logic to resend the event later in case of failure, then you are simply writing code to solve the dual write problem, which is exactly the point of the video. That's what CDC, the Outbox pattern, Event Sourcing, and the Listen to Yourself pattern all try to address. That's not to say there aren't other solutions, because there are. The problem comes when you assume that both operations will succeed, and don't implement any kind of retry logic.
      If you solve the dual write problem, then it's not a problem anymore. Just make sure you don't ignore it.