Game changer. In reality, people do build data products as a result of that friction in the centralized approach . They just don’t call it that way. However, I loved the phrases interoperable and trustworthy. There needs to be a well defined pattern around bringing consistency to inter-dataproduct interactions.
Absolutely, crisp and clear explaination by just bringing on the abstract of the concept. My perspective, though this is going to be a paradigm shift from lakehouse to data mesh, what is absolutely requried is to understand the traps which the Arch. Strategy team should be careful about.
Fantastic! Working for a steel company that is just dipping its toes into the Big Data platform field, aspects of optimzing resources with respect to data engineering et al. are paramount moving forward.
Very interesting presentation.. the only risk I foresee is that when data products consume each other instead of from a centralized lake, the errors and bugs will spread out into the network virtually unihibited. In centralized data pipelines, it is still possible to just purge and recreate the data product from immutable logs. But in this case, it will be extremely difficult to prevent the spread and later correct. If one data product goes down, it will impact the whole network. This is a problem with microservices architecture and this will be a problem in Data Mesh as well.
Absolutely true , in lala land this might work but not in real world. when we propose any architecture something we should always think of how seamless can we operationalize it not how can we build it , not just that u have couple it with resourceing challanges .
Lots of those problems have already been solved a decade or more ago. Most data project failures today (which she did not mention but Gartner ranges at around 70-80% percente) do not stem from lack of technology.
Very nice I just think it would be great if there were examples of how all of this connects to real world products. I mean, data products are commonly only legacy business products like a ERP or CRM, or MES system. What are the changes that should be made in those products so they can adapt to the domain perspective? How it would impact the current user journey? When I consider this I either conclude that we still have lots of big gaps to cover or I conclude that I didn't understand the concept
Me too in the same wavelength - I believe it started with new concept but ended with standard data governance process. The speaker should include examples how data product roles makes agile the modern data architecture governance .
How i can think of is if you have a client_id in one of the Data Product /Pod team revolving around Claims, the same client_id can have a Policy but Policy will be owned by another Data Pod/Product team. Now technically speaking, pretend you have table_policy in one workspace and table_claim owned by another team in another workspace, then you have to join them when you want to consume for analytics. This is one weave of the data between pods. This is how am envisioning it, more thoughts or ideas better than me would help me as well to learn. I may be wrong here but this how they are trying to move the ownership and splitting the laod now across different Data Pod teams. Then you simply join those information for your analytics when needed in reports. The technical concepts remains same but the accountability and responsibility are drafted to teams. This is how Mesh is created. Straigh away i can see this will create more dependency and costly to implment as well. Just like the front-end teams. I have a feel this will be too expensive to implement that is the only negative i see. Pretend you have 25 applications in an organization. How are you doing to split the ownership of those to the Data Pods ? I have those questions in my mind. Very interesting approach and am also curious as you to work in that space now and learn from it. It is a mindset and journey to embark on. A new paradigm hope this doesnt become a recruiter marketing gimmic. Another problem this can create a hugh dependency between networks of Data Pods. See this is not like K8s like you can bring the app in a flash, this is data with volumns which means it is massive in nature, hence if one Pod has issues then there will be ripple effect on other pods how are conusming that. So this is creating new problems in engineering. Trying to solve problem by creating a new one clearly not a good architecture imho. Sometime sticking to basics and planning properly solves the problem in simple.
Excellent presentation! I would definitely agree that maintaining a single data lake at scale for a large organization is definitely a big challenge. I believe that some organizations built data lakes to avoid data duplication and also to build a “single source of truth”. How does this architecture address those problems ?
If you look into data lake architectures, there is no importance given to domain of the input sources or consumers that they're serving. This is why data lakes are basically "single source of truth" for all of the data in the organization irrespective of the domain that they belong to. Now if you logically break down these into data products to their respective domain, then each of these data products are infact "single source of truth" for that particular domain. So instead of having "single source of truth" for all domains, we have "single source of truth" for each of the domains in the form of data products.
@@AnkitAnchan666 - this we are already doing it in refined layer or Gold Layer. I see the perspective now. Just organizing team to use the lake and start building information wearing a data product mindset (Ownership and Accountable is solved). Again perspective and from where you are looking at things matters. Situation, scneario and people matters when making this choise. With my Data Enterprise architect hat. @Anuruthan Thayaparan - This is not an architecture change it is just a mindset and management of people carving them to a particular scope. It is like creating TRUSTED DATA PRODUCT TEAMS in GOLD/ REFINED/ TRUSTED LAYER in the data lake world.
Star models are built for technologies? No, these are optimized around business processes and are more end user understandable. So what about the integration of business keys if you move the DWH to the source systems?
Data mesh? Another hype? We have gone from centralized to decentralized data. From centralized analytics to self service analytics? There is no silver bullet solution. Data mesh is mostly a cultural change and less technology. And no matter what solution. The weakest link in the multiple data product teams determines the success. Data Governance is a challenge and with cross domain products teams are dependable on each other. So what problem are we really trying to solve we are already not able to solve in monolytical data silos? It doesn’t matter if you ingest data, cleanse and create data product decentralized or centralized. We are still talking about creating data processes that need to be developed, released and maintained. Will it go faster? Will it add more quality? Will we be more flexible? Or are we creating chaos because we are not able to manage the multiple data product teams that will do what ever they want eventually because they don’t want to be dependable on other teams? So centralize where you have to and decentralize where not. Within 5 years from now we are talking about the Data Labyrinth because we are all lost in translation.
Complimentary to any warehouse lake, lake house, or mesh... or whatever new concept of data storage governance and security model is thought up in the future Our AI/ML effortlessly harmonizes and contextualizes all data across all silos even third-party or unstructured content like handwriting, images, etc If you want to achieve mesh at any scale, you'll want to talk to me.
Game changer. In reality, people do build data products as a result of that friction in the centralized approach . They just don’t call it that way. However, I loved the phrases interoperable and trustworthy. There needs to be a well defined pattern around bringing consistency to inter-dataproduct interactions.
Absolutely, crisp and clear explaination by just bringing on the abstract of the concept. My perspective, though this is going to be a paradigm shift from lakehouse to data mesh, what is absolutely requried is to understand the traps which the Arch. Strategy team should be careful about.
Fantastic! Working for a steel company that is just dipping its toes into the Big Data platform field, aspects of optimzing resources with respect to data engineering et al. are paramount moving forward.
Great job Zhamak! Was very good information regarding data mesh. Easy to understand and good ideas.
Thought provoking session on distributed analytics architecture. Thanks
Absolutely agree!
Very useful and thought provoking, thank you for sharing.
Fantastic, thought provoking and great guidance.
Very interesting presentation.. the only risk I foresee is that when data products consume each other instead of from a centralized lake, the errors and bugs will spread out into the network virtually unihibited. In centralized data pipelines, it is still possible to just purge and recreate the data product from immutable logs. But in this case, it will be extremely difficult to prevent the spread and later correct. If one data product goes down, it will impact the whole network. This is a problem with microservices architecture and this will be a problem in Data Mesh as well.
Absolutely true , in lala land this might work but not in real world. when we propose any architecture something we should always think of how seamless can we operationalize it not how can we build it , not just that u have couple it with resourceing challanges .
Low coupling and high cohesion ... again :) Very nice presentation !!!
Lots of those problems have already been solved a decade or more ago. Most data project failures today (which she did not mention but Gartner ranges at around 70-80% percente) do not stem from lack of technology.
Data mesh + Better Value Sooner Safer Happier for the win :-)
Very nice
I just think it would be great if there were examples of how all of this connects to real world products. I mean, data products are commonly only legacy business products like a ERP or CRM, or MES system. What are the changes that should be made in those products so they can adapt to the domain perspective? How it would impact the current user journey?
When I consider this I either conclude that we still have lots of big gaps to cover or I conclude that I didn't understand the concept
Me too in the same wavelength - I believe it started with new concept but ended with standard data governance process. The speaker should include examples how data product roles makes agile the modern data architecture governance .
How i can think of is if you have a client_id in one of the Data Product /Pod team revolving around Claims, the same client_id can have a Policy but Policy will be owned by another Data Pod/Product team. Now technically speaking, pretend you have table_policy in one workspace and table_claim owned by another team in another workspace, then you have to join them when you want to consume for analytics. This is one weave of the data between pods. This is how am envisioning it, more thoughts or ideas better than me would help me as well to learn.
I may be wrong here but this how they are trying to move the ownership and splitting the laod now across different Data Pod teams. Then you simply join those information for your analytics when needed in reports. The technical concepts remains same but the accountability and responsibility are drafted to teams. This is how Mesh is created. Straigh away i can see this will create more dependency and costly to implment as well. Just like the front-end teams. I have a feel this will be too expensive to implement that is the only negative i see. Pretend you have 25 applications in an organization. How are you doing to split the ownership of those to the Data Pods ? I have those questions in my mind. Very interesting approach and am also curious as you to work in that space now and learn from it. It is a mindset and journey to embark on. A new paradigm hope this doesnt become a recruiter marketing gimmic.
Another problem this can create a hugh dependency between networks of Data Pods. See this is not like K8s like you can bring the app in a flash, this is data with volumns which means it is massive in nature, hence if one Pod has issues then there will be ripple effect on other pods how are conusming that. So this is creating new problems in engineering.
Trying to solve problem by creating a new one clearly not a good architecture imho. Sometime sticking to basics and planning properly solves the problem in simple.
Love this! Hello to the new future of data management :)
Hey I'm watching this now and seeing your name :) Small world, how are you?
@@ozlemakpinarsel2674 hehe small world indeed! All good here, hope you are doing great as well
data mesh ~ data analog of "microservices" architecture for a mesh of services
Excellent presentation! I would definitely agree that maintaining a single data lake at scale for a large organization is definitely a big challenge. I believe that some organizations built data lakes to avoid data duplication and also to build a “single source of truth”. How does this architecture address those problems ?
If you look into data lake architectures, there is no importance given to domain of the input sources or consumers that they're serving. This is why data lakes are basically "single source of truth" for all of the data in the organization irrespective of the domain that they belong to. Now if you logically break down these into data products to their respective domain, then each of these data products are infact "single source of truth" for that particular domain. So instead of having "single source of truth" for all domains, we have "single source of truth" for each of the domains in the form of data products.
@@AnkitAnchan666 Thanks. That makes sense.
Good presentation! How we will manage cross domain data in this data mesh approach?
@@AnkitAnchan666 - this we are already doing it in refined layer or Gold Layer. I see the perspective now. Just organizing team to use the lake and start building information wearing a data product mindset (Ownership and Accountable is solved). Again perspective and from where you are looking at things matters. Situation, scneario and people matters when making this choise. With my Data Enterprise architect hat.
@Anuruthan Thayaparan - This is not an architecture change it is just a mindset and management of people carving them to a particular scope. It is like creating TRUSTED DATA PRODUCT TEAMS in GOLD/ REFINED/ TRUSTED LAYER in the data lake world.
Star models are built for technologies? No, these are optimized around business processes and are more end user understandable. So what about the integration of business keys if you move the DWH to the source systems?
Persians are really smart yet friendly and nice. Zhamak is not an exception obviously.
Data mesh? Another hype? We have gone from centralized to decentralized data. From centralized analytics to self service analytics? There is no silver bullet solution. Data mesh is mostly a cultural change and less technology. And no matter what solution. The weakest link in the multiple data product teams determines the success. Data Governance is a challenge and with cross domain products teams are dependable on each other. So what problem are we really trying to solve we are already not able to solve in monolytical data silos? It doesn’t matter if you ingest data, cleanse and create data product decentralized or centralized. We are still talking about creating data processes that need to be developed, released and maintained. Will it go faster? Will it add more quality? Will we be more flexible? Or are we creating chaos because we are not able to manage the multiple data product teams that will do what ever they want eventually because they don’t want to be dependable on other teams? So centralize where you have to and decentralize where not. Within 5 years from now we are talking about the Data Labyrinth because we are all lost in translation.
I don’t get it. Data mesh seems stupid to me
Complimentary to any warehouse lake, lake house, or mesh... or whatever new concept of data storage governance and security model is thought up in the future
Our AI/ML effortlessly harmonizes and contextualizes all data across all silos even third-party or unstructured content like handwriting, images, etc
If you want to achieve mesh at any scale, you'll want to talk to me.