4:19 and CDC itself can also capture specific views instead of tables directly so it would be another layer of separation. As a service B owner I can change internal of schema whatever I like as long as I comply with my contract so having CBC targeted to db view could ease that process
Thank you Dereck, awesome explained as always! Right now I am working on decoupling my own domains from monolithic application to own service. Some tables can be just moved, but looks like a have to copy a few tables as they are used in infra layer in "my" domains. I think CDC might be a good option to keep the data consistent.
Depending on the usage, another option is providing a API that exposes that underlying data so you aren't accessing it from a replicated table (assuming that's what you mean).
I get the usecase of CDC that should have different structure which describes a business event instead of a CRUD event, but that translation from database logs seems challenging, however, what could be easier to do is if a transaction happens at Application A, we don't care what it persists to its own state ful data source and it can very well publish the transaction to an event streaming platform and the whole usecase for CDC can go away.
Great video, like neal already mentioned, CDC probably shouldn't be used for integration events. I wonder how you would integrate a warehouse inside a business, especially having multiple (external) services. Would you use CDC, ETL or ELT or some other methods?
I understood CDC to be more for integration at the data layer, for replication, data warehousing, ETL etc. In the example of Service B using CDC to go though a translation layer to write a message to the message broker, would it not be simpler for the service layer to publish a message to the broker itself, since it would have the business context at this stage and not introduce CDC and a translation for event driven architecture? It seems like a complication to me since all database updates should be going through the service anyway as you point out? Great channel by the way!
Yes, while that would be ideal, if it's a legacy system, most start using CDC as away to derive events because there may not be a single point of truth for making state changes. Meaning you have multiple applications or processes that might be changing data without having any control over if they are publishing the required event.
@@CodeOpinion Ok yes, that makes sense for a legacy application or proprietary system where you don't have access to change the code, now that I think of it I have used this strategy in the past for an old school POS systems on their MSSQL database, thanks for the reply
@@CodeOpinion But is this translation layer a Service's B responsibility ? Because it might be difficult for a legacy system to do it, and cdc is seen as a way to gather data from that system without making changes to it.
@@MrSpyTubes In this particular usecase, Service A is legacy, so, no control over which process is writing to DB and how, Service B is a newer more flexible service and needs real-time events to update DWs, Caches, etc. The CDC process needs to have some of the business logic to not only read the CRUD events from DB logs, but transform it into a business event. Now, that's only possible to do if the underlying data structure is understood and some business logic from the Service A application is applied to transform the CDC event before publishing to the event streaming broker.
You could use CDC in that way to publish events. I would persist the actual events in the db and have cdc process publish those rather than trying to infer and transform a data change to an event.
It seems like CDC is a great choice in scenarios that we are using event in data centric way to do some data distribution/propagation or something like that, so I mean, in event carried state transfer. Do you agree?
Yes, I'd say CDC is closer to event-carried state transfer than anything. I'm always very cautious with it because sharing data between services is not something I advocate. It depends on what the data is for and how it's being used.
@@CodeOpinion So, I understand not sharing data between services means not sharing the underlying data structure or data storage system between services which couples different services together and creates complexity which can't be easily managed or broken down. So, sharing the business events as CDC is more desirable approach.
Are you using CDC? What's the use-case?
j
4:19 and CDC itself can also capture specific views instead of tables directly so it would be another layer of separation. As a service B owner I can change internal of schema whatever I like as long as I comply with my contract so having CBC targeted to db view could ease that process
I agreed with you, I am using the same approach in production which I've do CDC at View Level instead of making CDC in a database table.
"Data on the inside is different than data on the outside." Under-appreciated point, had not yet heard it articulated thus.
I also often say Private vs Public to get across the same gist.
Perfect timing 😂. This Saturday I will present my thesis that shows IDM solutions that use event-driven architecture and Derbezium software.
Always a pleasure watching your videos - many thanks for sharing all this good knowledge
Thanks. Appreciate you watching.
This was an excellent explanation and very timely. Thanks!
Thank you for making this.
My pleasure!
Thank you Dereck, awesome explained as always!
Right now I am working on decoupling my own domains from monolithic application to own service. Some tables can be just moved, but looks like a have to copy a few tables as they are used in infra layer in "my" domains. I think CDC might be a good option to keep the data consistent.
Depending on the usage, another option is providing a API that exposes that underlying data so you aren't accessing it from a replicated table (assuming that's what you mean).
I get the usecase of CDC that should have different structure which describes a business event instead of a CRUD event, but that translation from database logs seems challenging, however, what could be easier to do is if a transaction happens at Application A, we don't care what it persists to its own state ful data source and it can very well publish the transaction to an event streaming platform and the whole usecase for CDC can go away.
Can you share some available open-source CDC tool/library?
Debezium, mentioned in video
Great video, like neal already mentioned, CDC probably shouldn't be used for integration events.
I wonder how you would integrate a warehouse inside a business, especially having multiple (external) services. Would you use CDC, ETL or ELT or some other methods?
Not mutually exclusive, could be a combination.
I understood CDC to be more for integration at the data layer, for replication, data warehousing, ETL etc. In the example of Service B using CDC to go though a translation layer to write a message to the message broker, would it not be simpler for the service layer to publish a message to the broker itself, since it would have the business context at this stage and not introduce CDC and a translation for event driven architecture? It seems like a complication to me since all database updates should be going through the service anyway as you point out? Great channel by the way!
Yes, while that would be ideal, if it's a legacy system, most start using CDC as away to derive events because there may not be a single point of truth for making state changes. Meaning you have multiple applications or processes that might be changing data without having any control over if they are publishing the required event.
@@CodeOpinion Ok yes, that makes sense for a legacy application or proprietary system where you don't have access to change the code, now that I think of it I have used this strategy in the past for an old school POS systems on their MSSQL database, thanks for the reply
@@CodeOpinion But is this translation layer a Service's B responsibility ? Because it might be difficult for a legacy system to do it, and cdc is seen as a way to gather data from that system without making changes to it.
@@MrSpyTubes In this particular usecase, Service A is legacy, so, no control over which process is writing to DB and how, Service B is a newer more flexible service and needs real-time events to update DWs, Caches, etc. The CDC process needs to have some of the business logic to not only read the CRUD events from DB logs, but transform it into a business event. Now, that's only possible to do if the underlying data structure is understood and some business logic from the Service A application is applied to transform the CDC event before publishing to the event streaming broker.
Amazing video
We have high number of tnx on sql server, Could enabling CDC affect overall SQL performance?
Late reply, but yes. CDC adds overhead and this could negatively impact the server.
is outbox pattern a specific implementation of cdc?
You could use CDC in that way to publish events. I would persist the actual events in the db and have cdc process publish those rather than trying to infer and transform a data change to an event.
It seems like CDC is a great choice in scenarios that we are using event in data centric way to do some data distribution/propagation or something like that, so I mean, in event carried state transfer. Do you agree?
Yes, I'd say CDC is closer to event-carried state transfer than anything. I'm always very cautious with it because sharing data between services is not something I advocate. It depends on what the data is for and how it's being used.
@@CodeOpinion So, I understand not sharing data between services means not sharing the underlying data structure or data storage system between services which couples different services together and creates complexity which can't be easily managed or broken down. So, sharing the business events as CDC is more desirable approach.