This is a good look at mistakes that corporations make. It looks at architecture issues and the severe consequences they can have. Great job Jerry. Thanks for putting this up.
It sounds like that they never did any integration testing during those 18 months. Good presentation by the way: clearly spoken and a good balance between detail and overview.
After watching the presentation and reading the comments I decided to add this. In my personal opinion the biggest problem of IT these days is that people cannot think outside of the box. They can only think in design patterns they've read, they can only rely on books (basically they only trust what other people have said). I understand that not everybody is smart enough or does not have good enough analytic skills, but get at least one person like this in your company. Is like you got a request to build a car which can transport some wood from the forest. You go out there in the world of libraries, design patterns and books and you see all these shiny red Ferrari and Lamborghini with great benefits. But for f*** sake these cars are not made for you to transport the wood from the forest. Open your eyes and choose what is best for you not what is "trendy". If you can't find such car, take a step and build the car yourself by assembling smaller parts from different suppliers. Then there is also the misc concept of building a "flexible system". When you build the car to get the wood from the forest don't make it "flexible" enough to be able to get the wood from a forest on planet Mars because that will NEVER happen and will never be a valid use case.
There is a difference between reading books about well proven concepts like Domain-Driven Design or design-patterns and going with what is trendy. DDD was published in 2004 and still has great relevance. That should tell you something. I'm not saying believe in all books written about software-architecture but there are some gems out there worth to be read. Only staying in your own bubble of known concepts makes you stagnant. Why reinvent the wheel? It has been proven many times to be a great idea. Thinking outside the box still exists in a world, where you learn from the past.
@@DevVader You're missing the point. It is true that those concepts have been well proven and are real good for some situations. But you should not restrict your mind to using what others have thought of. Instead you should think of how it can suit your needs and how you would eventually have to tweak it. Scrum is the best example of that. People use it for everything without even thinking if the concepts they have read apply to their very own situation. Though it can be slightly adapted in many ways to better suit different situations/domains/contexts. So my advice is: search for well proven concepts/features/whatever and use them intelligently and not blindly. That is all I - personally - think. Every company / every project has different needs.
@@gregblastfpv3623 Yes, if you meant that with your first comment I mostly agree. It definitely is important how you interpret those known concepts. But I also think, if you really understand these concepts like DDD you can pretty much rely on those sources. Fitting your understanding of the domain in a working model requires most of the outside box thinking to me. The problem is, especially with Scrum but also DDD, often only half of the concepts are actually being used.
we ended up implementing similar solution for our "search requests". Have an elasticsearch cluster being updated by triggers in the database. Main reason we went this route was so that none of the other 50+ services that were modifying the product data would need to be changed. Again this is not ideal but in our retroactive situation of "why do i have to wait 9 seconds for a search" it worked really well. A+ talk really great stuff. wish this talk was available 3 years ago ;)
Fantastic explanation especially for the dependency inversion. tons of talks out there still describe micro-services along the one-database per service and you have to do tons of APIs calls and have services calling other services. That is why Kafka and other messaging technologies don't show up in their designs/talks. The you explained it make it a no-brainer to have a Kafka like solution. Thx a ton.
Pablo Pazos Perhaps that explains the high network traffic. But it could also be a deadlock if the recursive request required the same internal lock that had already been acquired by itself (ie from the same chain of API calls)
There is generally one "system of record" and common to have many "sources of truth" for the same data. Not all sources of truth must be in sync with each other, but that's an SLA that the architect must decide. The system of record is the system that owns the data. A source of truth is just a system that can be trusted. A caching layer can be a source of truth.
this case is the classic guinea pig , a first slide project in a presentation, a perfectly spherical sheep in a frictionless pasture of monolith/micro-services
Watching this video i understand the disconnect between the search and the configators... my experience was that i could never find anything i wanted via search because it only indexed the latest models... which is never what i seek...
@@matthewabbott7168 What are you talking about? Each "API" had a 99.9% uptime. Looking at a call involving 200 distinct "APIs" the combined uptime is 81.9%, not 0%. So a request would fail with a probability of 18.1% because one of the "APIs" would be down. This is completely separate from response or service times.
This is not how "uptime" works. 99.9% does not mean that one randomly selected request out of 1000 fails. Imagine a file server, which randomly deletes one out of 1000 files. That would be 0% acceptabe for ANY usage. Uptime failures happen in chunks because of some reason, like a server updating/restarting, network cables being rerouted, power outages, etc. So the service might be unavailable for an hour, but that will only happen once every couple of months. Within that hour one million out of one million requests will fail. But at any other time one billion out of one billion requests succeed. This is also only a guarantee, not an ultimate promise. So it might fail more often, but then you can request compensation, e.g. reduced usage fees.
4:00 my most hated saying in software, "if they gave you the feature why not use it" is the physical equivalent to shooting yourself in the face and then saying "Well no one stopped me!"
I think it's more of a problem with the origin of the requests. It's kind of hard rule a service to service call cannot result in another service to service call. Otherwise you get an untenable cascade of requests. I think I came to my own conclusions when I say when facing the need to chain service requests either duplicate data or change the process, refactor a domain, etc. Just don't make another service call. I think the problem is companies that go ham on this architecture often head down that road in order to facilitate engineering growth, dividing teams along domain and service boundaries. At the point you're looking at data duplication or refactoring another domain you'll have to jump through PD hoops and push priorities that infringe on those of another team. The end result is... 9 pregnant women do not deliver a baby in 1 month. Also job is hard.
Wow ... I'm still baffled / shell-shocked how "hire as much people as you can for your dev team (no matter what/why)" can be a KPI for the team manager... and not only that - its a KEY indicator for career success. Not to mention that in that company quite a bunch of IT managers turned a blind eye on the concept as whole.
It becomes the KPI when their salary is determined by the size of their department and not on the results of said department.. Most likely not an intended KPI but one that matter for the manager. Sure, there might be some that question this, but unless they manage to explain why the current system results in bad motivation, which will include saying that all manager are more or less ilojal as they are growing their teams, things will not change. These kind of situations usually occurs due to lack of understanding, and having a lower personnel question you is rarely taken in a good way.
This shit happens because corporations are controlled by idiots. People with talent who can actually do work, they do work. It's the idiots who end up in positions where they spend all day in meetings talking about arbitrary pointless crap and never doing anything useful, those are the people who end up making these kinds of decisions like "Hey, let's promote based on how many people they manage!", I'm sure they had a whole process going for this one like they were probably drawing stick figures on post-its and listening to motivational music while writing on postits and sticking them to the wall so they could further organize and filter the post-its etc.
18 months, "hundreds" of developers ... and NOT A SINGLE tester, integrator, build engineer ? No dedicated test environments ? Most people who had done ANYTHING in their lives would have pointed out this at their 2nd week on the job at most. Had there really been no single test release/build and basic smoke test ?
Yes, it seems unbelievable, like someone designing all the parts to a car and not putting any two pieces together until all the parts have been designed and manufactured. On a sad note, I worked for a large bureaucratic organization with upper management that apparently fell for snake oil promises of a "grand project that will magically replace and solve all the problems of existing legacy systems". After years of designing and funding that was diverted from legacy projects to that savior project, I heard the only thing it had to show were a few prototype/toy applications that didn't much of anything practical. So to some degree, I can almost believe it. Besides, if you successfully unit test everything, it *must* be perfect, right? ;)
Yeah, you name it. A company with a billion dollar revenue never has no testing environment. This really makes the whole story look like it was made up. Also this kind of architecture makes no sense at all. Why should a microservice call other microservices? Microservices are obviously based on the unix principle. But of course in the unix principle there has to be a central point where you combine the little programs to get a higher functionalty. Also, the search operation is a read only operation, right? So duplication of data is no problem at all.
I looked down the page until I found *someone* making this comment. You've literally saved what's left of my faith in the current state of the industry. I mean, WTF? How can people be so...? a) What did they think was going to happen? That all those service calls where going to happen in a time-warp and be instantaneous? I don't think I've ever seen people so incompetent. *Any* experience as a developer will tell you that this was going to be the result... actually probably any common sense at all! I'm pretty sure if I took someone who had never coded before, only used a computer, they would wisdom expecting reasonable performance from dozens of services calling each other for a single request! b) and as you pointed out... what about the testing? No-one noticed? Surely there's more to this than that. Perhaps it was an architecture conceived of in the mind of an over-promoted architect, who was warned by the engineers but didn't listen, and coupled with the organisation being so dysfunctional to not be using any sort of CI at all. If this is the case, the topic of the talk is wrong, the uService disaster is a red-herring, any project would fail in such an environment. It should be a talk on the dangers of not doing some form of CI. Sigh...
Ohhh, no. We don't need any of these pessimistic developers and engineers. We're a solutions-oriented business, not problem-oriented. We push them to the back and fire them as fast as we can because they lack soft-skills.
It’s all about data. Data architecture is the key for every application/services. No matter what fancy technology of the day is, at the end all the roads leads to the data. Start your software architecture with data architecture first.
But the data is just data. Data structures sitting on disk don't have business behavior. That's what objects do. In fact, you could build the basic core of your system, your use-cases and business rules, and not even touch a database.
@@mattmarkus4868 . Well, I don't know where to begin. This platform is not suited for a long conversation. But I will try. Your statement, "Data is just data." is an example of tautology that does not help in the conversation. Your comment about business rules that do not need a database is ridiculous. Firstly, I have not mentioned anything about databases (a specialized repository for datastore). Secondly, the business rules acting (making decisions) based on information/data. 'IF (traffic light is RED) THEN STOP." I hope this helps. Thank you for your comment.
36:45 since you don't own the replicated yet normalized data, how do you enforce some authorisation rules in this case? assuming search may require filtering some results for some users. Don't you eventually risk replicating authorization logic from your providers?
OMG! Back in the Seventies (Services or even Microservices would not be invented for another few decades) there was a move towards code reuse and modularization as a method to get there. The firast few experiments showed clearly that too much modularization increased the path length through your code to such an extent that - on those old machines back then - your response times went through the roof. And that was at a time where everything happened on the one and only processor in your one and only system (and not across a network ...) I remember debugging a situation where including a few hundred lines from a file into an edit session took infinitely longer than calling an edit of the whole file. The answer: The module to get a line from that file hid the internal structure of the function, which included opening and closing the file (for each call, i.e. for each line you included - if you called the module to get the whole file you only opened and closed it once ... yeah, well.) Lesson: Don't overmodularize, rather build services with a meaningful task, and then understand what the "black box" services you call are actually doing.
35:38 So in essence you're saying they created this microservice architecture to replace the big ball of mud that was before and instead, you implemented another big ball of mud that grabs files from some random file server and requires database triggers that noone knows? Honestly, that's a pretty shitty result.
So, the basic design did work. By fixing up the database access time problem the application was OK. A very long presentation to say very little other than how brilliant the speaker was.
It's funny how the takeaway is totally non technical. Ironically the last few companies I worked for that wanted to produce microservices got a presentation on how it was their company culture that was preventing them from designing tools that functioned well. We could distill this even further: Sales culture seeks to extract value from customers, Engineering culture seeks to provide value to customers!
@12:22 first i dont get the linear addition of times. some are happening parallel in the graph. we need to use the longest path in the graph, not the number of nodes overall. second if we have 200 elements needed for a solution with 0.1% errors. isnt the overall error 1-(0,999)^200 and about 20%? please tell me better
actually one comment below by @matthewabbott7168 explains it all: Except those calls are all part of the same request, and in aggregate eat up the timeout. 19% failure rate + 81% timeout rate = 0% uptime. Basically, even if we have 81% uptime, remember our SLA guaranteed upper bound of 150ms response time which is not possible even when 200 requests reply within 1ms, given they are all chained
Speaker may sound a bit judgemental for the person who does not have corporate experience but what he says is so true! That part with laws is super important. Connections between how organizations "behave" internally and the systems they design is fundamental! Been There, Done That. Barclays, Deutsche Bank, UBS, you name it... IT department in pretty much any big financial institution is such a swamp… even though salaries are usually higher than on the rest of the market. It's almost impossible to produce worthy systems with unhealthy goals and metrics and crappy managers in place. No matter how amazing some individual contributors are and what they suggest. Pretty much no one gives a crap. In the worst case managers might want to get rid of you if your efforts make it obvious that head count can be significantly reduced.
The funny thing is that the way, how to avoid wasting 18 months just to find out nothing works, was described in the fundamental paper by Winston Royce "Managing the Development of Large Software Systems" published in 1970.
Recently I was speaking with a devops in a job interview. I mentioned (off-topic, not related to job) that I was just now thinking about creating a special session (and object) cache that was much faster than usual solutions and I asked what does he think about it, does he have some comments or guidance about that. He said to me, that since microservices is now becoming popular, no one will want such a cache, because everyone "has at least 1 Mbit connection" and it's about milliseconds difference etc. Nvm. that I did wanted such a cache because I was working with really big finance-related forms and I did really need a session to not overload connections and maintain real time responses, microservices wouldn't work for me at all for those particular forms, but I didn't want to argue on a job interview so I dropped the topic. I was actually stunned that he would say such a thing about connection speeds, especially because his company was making web applications for public transportation companies. Also, I always believed that in microservices one has to put special care to the speed, because of additional irremovable latencies that will be far above few ms no matter what you do. I was learning some communication electronics and I know, that if anyone assumes some kind of big transfer rates for mobile phones (which are used by travelers - clients of transport companies, obviously) he is in for a big surprise, even for 4G or 5G communications. That is simply because while you are traveling, you have no guarantee of getting any communication at all, and mostly you will get lesser, rather than bigger transfer rates because of physics of electromagnetic wave propagation. These 4G or 5G technologies do not actually give any transfer rate guarantees at all, they just increase upper cap of transfer rates in particular circumstances, but can still fall back to old 50Kbit/s and 500ms latency when conditions for better speed are not met (and they are rarely ever met for travelers). Anyone can check that in any train, but I already know that this devops wasn't in any train for a long time now :P I believe that microservices is a dangerous architecture nowadays, that so many coders don't take any speed considerations into account. The biggest error in microservices is that there is no emphasis "time budgeting". Time budgeting is a term from electronics and mechanics engineering. It simply means that you first set up a time limit for whole action (like page loading), and then divide that into all stages of action execution (like latency, downloading content, executing logic) and try to keep your time limits for all those stages.
Organizations want everything to be loosely coupled. The tech grasp time and resources we put into that has no real advantage apart from profiting the cloud industry.
Thinking about Conway's an jimmy's Laws, I suspect I understand now why so many implementations of standard software (e.g. SAP, Siebel (in their days) etc.) fail ... Maybe the SAP guys in the early days had a point when they designed their system to model the "standard, maybe even best practice" organization. That leads me to Gerd's Law: "There is a good possibility of very good reasons, why (for example) SAP doesn't exactly match your business's organization - maybe (for example) SAP is right."
so.... their solution, their one chance in 15 years to break away from an old system! the thing most developers would kill for, a clean slate.... is now held together with DB Triggers, Which was done under the table with some DBA who enjoyed fine wiskey.... Some other DBA will come in 4-5 years and be like :? WTF are these triggers for!?
THAT. Jesus, I thought the design would be a bit more modern but this seriously smells. It's nice to bash a software that worked for 15 years - something one should never underestimate, but if your solution is based on replicating data, batch jobs running daily with a back up devops team and DB triggers, I'm not sure who should be laughing.
I fa DBA deletes a Trigger before he understands what it is for he doesn't deserve his job. The last one seemed to have been fired for his inability do document.
Interesting that you inverted pricing as in, made search service store price details within it search engine i guess. Pricing will be required by not just search but by lot of other services, for ex product details UI, Order UI, cart/checkout UI etc. Do you recommend each of these service store prices within their systems? And when there is price misalignment or Forex change etc expect all the services to re-sync with price?
"Original architect left to spread his mess somewhere else" They probably didn't hire someone with the required knowledge and just pushed the task on some mid level dev who was hopelessly out of his depth. I'd have left too.
I can’t believe there was no integration test environment prior to production. Bashing other developers and systems so publicly while glorifying your solution (to just 1 problem area) is not a smart thing to do. Additionally your solution to good systems is changing the company structure and then all will be good is bad advice. You just need good IT specialists who know what they are doing and have top business process people, who know what they are doing as well. You performance-tuned a service: good for you :-)
I don't know if I didn't understood dell's requirements or something but why do dell want everything in a microservice? like can't they make a monolithic application and have some microservice for components they they need.
Isn't 99.9% uptime across 200 calls the same as .999^(200) = 81% uptime for all 200 calls taken together? Like the odds of one call failing is 19%? I don't think it's even close to 100%?
it's the chance of succeeding 200 times in a row with 99.9% success chance for each even a single failure will cause complete failure the chance of success I got was 1-(0.999)^200 = 0.18 or 18% chance to succeed
good talk, nice to see some retrospectives. ALso the check for does this microservice have a business entity to represent is a good point to look if it is needed or will ever be finished.
What is the full title/author of the "DDD" book mentioned in description of Bounded Context about 21 minutes into the talk? I am guessing it is Domain Driven Design by Evans Eric. Is this correct?
I dont understand the "solution" at 30:45 instead of removing dependency, he's only reducing it to "once a day", with the added notion "or more, if required".... that just seems so silly to me, ???
Well if we just take a look at a computer, we'll see that computers basically are designed for data duplication. If you execute an application stored on a hard drive, parts of it will be copied to the ram and then smaller parts will be copied to the cpu caches. One single piece of code had to travel through 3 different storage systems in order to be executed, the same data had 3 replicas in the end. As long as it has a reason and is done well data duplication is not a bad thing.
So the story is really an implementation of a distributed mainframe, they just did the same as they did in the mainframe, but with API calls. I see this approach also with no-SQL technologies, where databases are implemented as if they were relational. The whole project conception is wrong, but hey, people got paid :)
How does 'Search polls Catalog' (or 'Catalog pushes to Search') mean that Catalog is dependent on Search? In what way is that dependency inversion (which I assume means you take the converse of Search depends on Catalog)? This implies that the search service is dependent on the customer's laptop because that's where Search's information is going. What a perversion of the word dependency
Very informative but there's something I didn't get. When reversing the arrows, does that mean the "search" microservice has a sort of a crontab querying the other services once a day to get data ? Because he drew a "Denormalizer" outside of the search microservice, but what is this in concrete terms ? It feels like it should be 100% specific to the microservice's database it's inserting data into.
Yeah in the example he said they polled once a day, but what you really want to do is to use domain events to update the data of the search. so they are as close to live as possible.
As described, the architecture doesn't actually "reverse the arrows." Instead, they decoupled the arrows temporally -- they moved it to background jobs. Actually "reversing the arrows" would be publishing an "update search" interface that the other services implemented: when a new model is added, an event is pushed to update the search, and so on. That's the biggest flaw in this presentation.
This guy says it like it is in the crazy realm of my corporetocracy: "There are people who invent a problem and then create a team that specializes in mitigating it without ever solving it completely so that we these people can further their carreers at the expense of everyone else and the company as a whole" Truer words have never been spoken, yet, most companies would door-slam this guy because he's being "negative". All the better for guys like me for I would hire him in a heartbeat ha! - He knows his craft in depth, loves and cherishes it and shares his knowledge. - He's the benevolent dictator that doesn't mess around with computer science and knows when and how to set a healthy boundary when technical debt appears like cancer in an early stage. Thumbs up and hats off for keeping software engineers well-employed and happy in challenging projects man!
Data ready to be fetched is prob the safest way. I was trying to figure out how to do a crud app but then was made away how api keys in source code as being bad practice so it has to be in an .env but couldn't still figure out how to get it to work as it kept showing up as red in Android Studio Kotlin then researched and saw how .apks can still be reverse engineered to show the keys. Best way then is just to get information from somewhere else more secure and managed but I'm quite new so there could be better ways. It doesn't take very many steps to make a proper json call lmao
Great talk and perfect point in the end: IT is never a standalone thing it must be considered as merely a reflection of the actual processes. But I think it would've been nicer not to name the company and keep it anonymous.
this is about Dell isn't it ? the same Dell whose indian tech support would tell me that I should not keep a laptop on my lap because it's designed to sit on a desk or it will not function correctly ?
yes it's 1 new microservice that doesn't call the other systems all the time and is self contained to do 1 thing well: search. so they did a good job on this.
Yeah and they went from paper to electronic to prevent stale data.. because they were loosing money on low prices.. The analogy fails.. but I get his drift.. We’ve tried the data replicate and it’s stable but you fight missing updates.. needs good doc dB that is not hindered be reindexing and searcher reloads..
I like the parts where he's talking about software more than when he's dunking on other developers, corporate that they are. The hypothesis that the system is a copy of the organization says that it's not their fault entirely.
It seems that his team is dysfunctional, and he is just blaming the applications architecture (monolithic or microserves). He pointed it out the project manager left earlier before he get fired, typical of a rotten project manager. A good team can make things work in monolithic or in microservices, while a bad team can make spagetti-microservices as he described them in horror.
Well it was the obsession with the fashionable architecture that pushed those forces. Also, I'd say a good team is one that communicates well. That's probably 80% of it. The x architecture shouldn't be the aim. The architecture should reveal itself naturally as a good team of people try to solve specific problems.
This is a good look at mistakes that corporations make. It looks at architecture issues and the severe consequences they can have. Great job Jerry. Thanks for putting this up.
It sounds like that they never did any integration testing during those 18 months.
Good presentation by the way: clearly spoken and a good balance between detail and overview.
Well done for calling it out. 99% of expensive architects make mentioned mistakes all the time. It's insane.
After watching the presentation and reading the comments I decided to add this. In my personal opinion the biggest problem of IT these days is that people cannot think outside of the box. They can only think in design patterns they've read, they can only rely on books (basically they only trust what other people have said). I understand that not everybody is smart enough or does not have good enough analytic skills, but get at least one person like this in your company. Is like you got a request to build a car which can transport some wood from the forest. You go out there in the world of libraries, design patterns and books and you see all these shiny red Ferrari and Lamborghini with great benefits. But for f*** sake these cars are not made for you to transport the wood from the forest. Open your eyes and choose what is best for you not what is "trendy". If you can't find such car, take a step and build the car yourself by assembling smaller parts from different suppliers.
Then there is also the misc concept of building a "flexible system". When you build the car to get the wood from the forest don't make it "flexible" enough to be able to get the wood from a forest on planet Mars because that will NEVER happen and will never be a valid use case.
What's so Common about sense anyway?
@Calin B Couldn't have said it better.
There is a difference between reading books about well proven concepts like Domain-Driven Design or design-patterns and going with what is trendy. DDD was published in 2004 and still has great relevance. That should tell you something. I'm not saying believe in all books written about software-architecture but there are some gems out there worth to be read. Only staying in your own bubble of known concepts makes you stagnant. Why reinvent the wheel? It has been proven many times to be a great idea.
Thinking outside the box still exists in a world, where you learn from the past.
@@DevVader You're missing the point. It is true that those concepts have been well proven and are real good for some situations. But you should not restrict your mind to using what others have thought of.
Instead you should think of how it can suit your needs and how you would eventually have to tweak it.
Scrum is the best example of that. People use it for everything without even thinking if the concepts they have read apply to their very own situation. Though it can be slightly adapted in many ways to better suit different situations/domains/contexts.
So my advice is: search for well proven concepts/features/whatever and use them intelligently and not blindly. That is all I - personally - think.
Every company / every project has different needs.
@@gregblastfpv3623 Yes, if you meant that with your first comment I mostly agree. It definitely is important how you interpret those known concepts.
But I also think, if you really understand these concepts like DDD you can pretty much rely on those sources. Fitting your understanding of the domain in a working model requires most of the outside box thinking to me. The problem is, especially with Scrum but also DDD, often only half of the concepts are actually being used.
we ended up implementing similar solution for our "search requests". Have an elasticsearch cluster being updated by triggers in the database. Main reason we went this route was so that none of the other 50+ services that were modifying the product data would need to be changed. Again this is not ideal but in our retroactive situation of "why do i have to wait 9 seconds for a search" it worked really well. A+ talk really great stuff. wish this talk was available 3 years ago ;)
I was with you all the way up to db triggers. It's a simple solution but has big ramifications when issues crop up.
Fantastic explanation especially for the dependency inversion. tons of talks out there still describe micro-services along the one-database per service and you have to do tons of APIs calls and have services calling other services. That is why Kafka and other messaging technologies don't show up in their designs/talks. The you explained it make it a no-brainer to have a Kafka like solution. Thx a ton.
Wow this is a nice video! I like how the speaker shared a real world example and shared the steps they took to approach microservices. Subbed.
"A broken, dysfunctional organization driven by meeting unhealthy goals and metrics will produce broken, dysfunctional systems." lol
That's so true...
Conway's Law in action :)
a somewhat related quote by Elon Musk: "The product errors reflect the organizational errors"
@@ktxed He's great at repackaging other people's work
That's literally every software company
"and now we had a distributed deadlock: it was really, really fun"
yep basically because that is not a deadlock but an infinite recursion.
Pablo Pazos Perhaps that explains the high network traffic. But it could also be a deadlock if the recursive request required the same internal lock that had already been acquired by itself (ie from the same chain of API calls)
One could say that infinite recursion is a point where no progress can be made...so therefore it's a deadlock.
wouldn't that be livelock?
@@JediOfTheRepublic But infinite recursion is not "a point where no progress can be made" at all. So it's not a deadlock.
There is generally one "system of record" and common to have many "sources of truth" for the same data. Not all sources of truth must be in sync with each other, but that's an SLA that the architect must decide. The system of record is the system that owns the data. A source of truth is just a system that can be trusted. A caching layer can be a source of truth.
this case is the classic guinea pig , a first slide project in a presentation, a perfectly spherical sheep in a frictionless pasture of monolith/micro-services
I think bell was a code-name for Dell as I can see he worked there for their e-commerce website 10 years ago :P (2007-2008)
It's so satisfying to know that a hardware company failed miserably at software
It kind of puts a check on their smug faces.
It shows a number of good reasons why hardware and software are not the same ...
Watching this video i understand the disconnect between the search and the configators... my experience was that i could never find anything i wanted via search because it only indexed the latest models... which is never what i seek...
To get a good systems architecture you need to perform the "Inverse Conway Maneuver" - brilliant!
He had me at "no data duplication" :D :D :D
So they went to pre-production without any QA / Test team taking it through basic sanity checks ?
99.9% uptime * 200 calls = 0.999^200 = 0.81 = 81% responce time. Fairly bad, but not 0%.
Yeah, I thought the same...
Except those calls are all part of the same request, and in aggregate eat up the timeout. 19% failure rate + 81% timeout rate = 0% uptime.
@@matthewabbott7168 What are you talking about? Each "API" had a 99.9% uptime. Looking at a call involving 200 distinct "APIs" the combined uptime is 81.9%, not 0%.
So a request would fail with a probability of 18.1% because one of the "APIs" would be down.
This is completely separate from response or service times.
The calls aren't independent
This is not how "uptime" works. 99.9% does not mean that one randomly selected request out of 1000 fails. Imagine a file server, which randomly deletes one out of 1000 files. That would be 0% acceptabe for ANY usage. Uptime failures happen in chunks because of some reason, like a server updating/restarting, network cables being rerouted, power outages, etc. So the service might be unavailable for an hour, but that will only happen once every couple of months. Within that hour one million out of one million requests will fail. But at any other time one billion out of one billion requests succeed.
This is also only a guarantee, not an ultimate promise. So it might fail more often, but then you can request compensation, e.g. reduced usage fees.
4:00 my most hated saying in software, "if they gave you the feature why not use it" is the physical equivalent to shooting yourself in the face and then saying "Well no one stopped me!"
Yeah like "hey this nail fits perfectly into the wall outlet, surely so that I can stick it in"
200 API calls per end user request? Yeah now you're making a macroservice architecture
I think it's more of a problem with the origin of the requests. It's kind of hard rule a service to service call cannot result in another service to service call. Otherwise you get an untenable cascade of requests. I think I came to my own conclusions when I say when facing the need to chain service requests either duplicate data or change the process, refactor a domain, etc. Just don't make another service call. I think the problem is companies that go ham on this architecture often head down that road in order to facilitate engineering growth, dividing teams along domain and service boundaries. At the point you're looking at data duplication or refactoring another domain you'll have to jump through PD hoops and push priorities that infringe on those of another team. The end result is... 9 pregnant women do not deliver a baby in 1 month. Also job is hard.
200 API calls is pure insanity but not maroservice architecture.
Wow ... I'm still baffled / shell-shocked how "hire as much people as you can for your dev team (no matter what/why)" can be a KPI for the team manager... and not only that - its a KEY indicator for career success.
Not to mention that in that company quite a bunch of IT managers turned a blind eye on the concept as whole.
It becomes the KPI when their salary is determined by the size of their department and not on the results of said department..
Most likely not an intended KPI but one that matter for the manager.
Sure, there might be some that question this, but unless they manage to explain why the current system results in bad motivation, which will include saying that all manager are more or less ilojal as they are growing their teams, things will not change.
These kind of situations usually occurs due to lack of understanding, and having a lower personnel question you is rarely taken in a good way.
This shit happens because corporations are controlled by idiots. People with talent who can actually do work, they do work. It's the idiots who end up in positions where they spend all day in meetings talking about arbitrary pointless crap and never doing anything useful, those are the people who end up making these kinds of decisions like "Hey, let's promote based on how many people they manage!", I'm sure they had a whole process going for this one like they were probably drawing stick figures on post-its and listening to motivational music while writing on postits and sticking them to the wall so they could further organize and filter the post-its etc.
18 months, "hundreds" of developers ... and NOT A SINGLE tester, integrator, build engineer ? No dedicated test environments ? Most people who had done ANYTHING in their lives would have pointed out this at their 2nd week on the job at most. Had there really been no single test release/build and basic smoke test ?
Yes, it seems unbelievable, like someone designing all the parts to a car and not putting any two pieces together until all the parts have been designed and manufactured.
On a sad note, I worked for a large bureaucratic organization with upper management that apparently fell for snake oil promises of a "grand project that will magically replace and solve all the problems of existing legacy systems". After years of designing and funding that was diverted from legacy projects to that savior project, I heard the only thing it had to show were a few prototype/toy applications that didn't much of anything practical. So to some degree, I can almost believe it.
Besides, if you successfully unit test everything, it *must* be perfect, right? ;)
Yeah, you name it. A company with a billion dollar revenue never has no testing environment. This really makes the whole story look like it was made up. Also this kind of architecture makes no sense at all. Why should a microservice call other microservices? Microservices are obviously based on the unix principle. But of course in the unix principle there has to be a central point where you combine the little programs to get a higher functionalty. Also, the search operation is a read only operation, right? So duplication of data is no problem at all.
hahaha maybe they thought it would go from their laptops straight to production and everything would be fine :D
I looked down the page until I found *someone* making this comment. You've literally saved what's left of my faith in the current state of the industry. I mean, WTF? How can people be so...?
a) What did they think was going to happen? That all those service calls where going to happen in a time-warp and be instantaneous? I don't think I've ever seen people so incompetent. *Any* experience as a developer will tell you that this was going to be the result... actually probably any common sense at all! I'm pretty sure if I took someone who had never coded before, only used a computer, they would wisdom expecting reasonable performance from dozens of services calling each other for a single request!
b) and as you pointed out... what about the testing? No-one noticed? Surely there's more to this than that. Perhaps it was an architecture conceived of in the mind of an over-promoted architect, who was warned by the engineers but didn't listen, and coupled with the organisation being so dysfunctional to not be using any sort of CI at all. If this is the case, the topic of the talk is wrong, the uService disaster is a red-herring, any project would fail in such an environment. It should be a talk on the dangers of not doing some form of CI.
Sigh...
Ohhh, no. We don't need any of these pessimistic developers and engineers. We're a solutions-oriented business, not problem-oriented. We push them to the back and fire them as fast as we can because they lack soft-skills.
It’s all about data. Data architecture is the key for every application/services.
No matter what fancy technology of the day is, at the end all the roads leads to the data. Start your software architecture with data architecture first.
But the data is just data. Data structures sitting on disk don't have business behavior. That's what objects do. In fact, you could build the basic core of your system, your use-cases and business rules, and not even touch a database.
@@mattmarkus4868 .
Well, I don't know where to begin. This platform is not suited for a long conversation. But I will try. Your statement, "Data is just data." is an example of tautology that does not help in the conversation. Your comment about business rules that do not need a database is ridiculous.
Firstly, I have not mentioned anything about databases (a specialized repository for datastore).
Secondly, the business rules acting (making decisions) based on information/data. 'IF (traffic light is RED) THEN STOP."
I hope this helps.
Thank you for your comment.
36:45 since you don't own the replicated yet normalized data, how do you enforce some authorisation rules in this case? assuming search may require filtering some results for some users. Don't you eventually risk replicating authorization logic from your providers?
Added +1 like for bringing me to real world environment understanding, but tempted for -1 for professional ethics ...
OMG! Back in the Seventies (Services or even Microservices would not be invented for another few decades) there was a move towards code reuse and modularization as a method to get there. The firast few experiments showed clearly that too much modularization increased the path length through your code to such an extent that - on those old machines back then - your response times went through the roof. And that was at a time where everything happened on the one and only processor in your one and only system (and not across a network ...)
I remember debugging a situation where including a few hundred lines from a file into an edit session took infinitely longer than calling an edit of the whole file. The answer: The module to get a line from that file hid the internal structure of the function, which included opening and closing the file (for each call, i.e. for each line you included - if you called the module to get the whole file you only opened and closed it once ... yeah, well.)
Lesson: Don't overmodularize, rather build services with a meaningful task, and then understand what the "black box" services you call are actually doing.
...after they've waited 9 minutes for a page load, they also said "maybe we did something wrong".
MAYBE? YA THINK?? XD
35:38 So in essence you're saying they created this microservice architecture to replace the big ball of mud that was before and instead, you implemented another big ball of mud that grabs files from some random file server and requires database triggers that noone knows? Honestly, that's a pretty shitty result.
An excellent answer to an broken concept who never worked, called microservices
Or how to replicate unix shell commands pipe in the web
So, the basic design did work. By fixing up the database access time problem the application was OK. A very long presentation to say very little other than how brilliant the speaker was.
"If we weren't meant to use it then why did they give us this feature?" 😂
sounds like anal sex
It's funny how the takeaway is totally non technical. Ironically the last few companies I worked for that wanted to produce microservices got a presentation on how it was their company culture that was preventing them from designing tools that functioned well. We could distill this even further: Sales culture seeks to extract value from customers, Engineering culture seeks to provide value to customers!
I've enjoyed this talk a lot. Thanks!
Great talk thank you very much
I am working on search microservice in my company and what you said is actually really relevant and interesting
A bottle of whisky, convinces any architecture.. until market pop up with new ones... :)
I enjoyed watching the ear. Thank you.
This dude is golden. Awesome talk! Thank you
1:11 "Um, so D... err... Bell computers..." XD
@12:22 first i dont get the linear addition of times. some are happening parallel in the graph. we need to use the longest path in the graph, not the number of nodes overall.
second if we have 200 elements needed for a solution with 0.1% errors. isnt the overall error 1-(0,999)^200 and about 20%?
please tell me better
same question, baffled trying to make numbers match
actually one comment below by @matthewabbott7168 explains it all: Except those calls are all part of the same request, and in aggregate eat up the timeout. 19% failure rate + 81% timeout rate = 0% uptime.
Basically, even if we have 81% uptime, remember our SLA guaranteed upper bound of 150ms response time which is not possible even when 200 requests reply within 1ms, given they are all chained
Speaker may sound a bit judgemental for the person who does not have corporate experience but what he says is so true!
That part with laws is super important. Connections between how organizations "behave" internally and the systems they design is fundamental!
Been There, Done That.
Barclays, Deutsche Bank, UBS, you name it... IT department in pretty much any big financial institution is such a swamp… even though salaries are usually higher than on the rest of the market.
It's almost impossible to produce worthy systems with unhealthy goals and metrics and crappy managers in place.
No matter how amazing some individual contributors are and what they suggest. Pretty much no one gives a crap.
In the worst case managers might want to get rid of you if your efforts make it obvious that head count can be significantly reduced.
The funny thing is that the way, how to avoid wasting 18 months just to find out nothing works, was described in the fundamental paper by Winston Royce "Managing the Development of Large Software Systems" published in 1970.
Regretrospective is my new favourite word!
Recently I was speaking with a devops in a job interview.
I mentioned (off-topic, not related to job) that I was just now thinking about creating a special session (and object) cache that was much faster than usual solutions and I asked what does he think about it, does he have some comments or guidance about that.
He said to me, that since microservices is now becoming popular, no one will want such a cache, because everyone "has at least 1 Mbit connection" and it's about milliseconds difference etc.
Nvm. that I did wanted such a cache because I was working with really big finance-related forms and I did really need a session to not overload connections and maintain real time responses, microservices wouldn't work for me at all for those particular forms, but I didn't want to argue on a job interview so I dropped the topic.
I was actually stunned that he would say such a thing about connection speeds, especially because his company was making web applications for public transportation companies. Also, I always believed that in microservices one has to put special care to the speed, because of additional irremovable latencies that will be far above few ms no matter what you do.
I was learning some communication electronics and I know, that if anyone assumes some kind of big transfer rates for mobile phones (which are used by travelers - clients of transport companies, obviously) he is in for a big surprise, even for 4G or 5G communications. That is simply because while you are traveling, you have no guarantee of getting any communication at all, and mostly you will get lesser, rather than bigger transfer rates because of physics of electromagnetic wave propagation. These 4G or 5G technologies do not actually give any transfer rate guarantees at all, they just increase upper cap of transfer rates in particular circumstances, but can still fall back to old 50Kbit/s and 500ms latency when conditions for better speed are not met (and they are rarely ever met for travelers). Anyone can check that in any train, but I already know that this devops wasn't in any train for a long time now :P
I believe that microservices is a dangerous architecture nowadays, that so many coders don't take any speed considerations into account. The biggest error in microservices is that there is no emphasis "time budgeting". Time budgeting is a term from electronics and mechanics engineering. It simply means that you first set up a time limit for whole action (like page loading), and then divide that into all stages of action execution (like latency, downloading content, executing logic) and try to keep your time limits for all those stages.
Organizations want everything to be loosely coupled. The tech grasp time and resources we put into that has no real advantage apart from profiting the cloud industry.
Denormalize it back to monolith to make it work was my take away from it.
Very interesting survival strategies for developers 00:37:10 onwards
Thinking about Conway's an jimmy's Laws, I suspect I understand now why so many implementations of standard software (e.g. SAP, Siebel (in their days) etc.) fail ... Maybe the SAP guys in the early days had a point when they designed their system to model the "standard, maybe even best practice" organization. That leads me to Gerd's Law: "There is a good possibility of very good reasons, why (for example) SAP doesn't exactly match your business's organization - maybe (for example) SAP is right."
So now there's only a 7 minute delay when the customer wants a price with more ram? Or a bigger monitor?
idk, but the HP NonStop systems I've seen didn't look like a HP SuperDome or the other PA-RISC boxes in that picture. Especially in the 2000s?
This is very appealing, can’t wait to try it out, thanks!
so.... their solution, their one chance in 15 years to break away from an old system! the thing most developers would kill for, a clean slate.... is now held together with DB Triggers, Which was done under the table with some DBA who enjoyed fine wiskey....
Some other DBA will come in 4-5 years and be like :? WTF are these triggers for!?
// Whisky river starts here...
THAT. Jesus, I thought the design would be a bit more modern but this seriously smells. It's nice to bash a software that worked for 15 years - something one should never underestimate, but if your solution is based on replicating data, batch jobs running daily with a back up devops team and DB triggers, I'm not sure who should be laughing.
I fa DBA deletes a Trigger before he understands what it is for he doesn't deserve his job. The last one seemed to have been fired for his inability do document.
Sam Newman book is good but is targeted at beginners. I was kinda expecting more theory on good/bad patterns and counter examples.
I think this might be about Dell.
I wonder if their IT was run by some offshore/onshore h1b factory giving kickbacks to the IT managment.
Interesting that you inverted pricing as in, made search service store price details within it search engine i guess. Pricing will be required by not just search but by lot of other services, for ex product details UI, Order UI, cart/checkout UI etc. Do you recommend each of these service store prices within their systems? And when there is price misalignment or Forex change etc expect all the services to re-sync with price?
@26:04 Get out of the way meathead!
Hahaahahaha Meathead. perfect
He shows the part where you should listen very carefully.
The attack of a giant ear
That was head of micro-services at Netflix
Skipping the ones made in particle accelerators, the most expensive metal is Francium.
"You put your numbers in wrong."
"No, you're wrong." Slams laptop shut and walks away.
"Original architect left to spread his mess somewhere else"
They probably didn't hire someone with the required knowledge and just pushed the task on some mid level dev who was hopelessly out of his depth.
I'd have left too.
I mean there's a few nuggets of info here but basically in a broken organization you can't come in and do the right thing.
32:04 out of curiosity, what is the best country to get laptop pricing?
A large side of snark but some really solid points! :)
Its a perfect usecase of MicroService Aggregate Pattern.
I can’t believe there was no integration test environment prior to production. Bashing other developers and systems so publicly while glorifying your solution (to just 1 problem area) is not a smart thing to do. Additionally your solution to good systems is changing the company structure and then all will be good is bad advice. You just need good IT specialists who know what they are doing and have top business process people, who know what they are doing as well. You performance-tuned a service: good for you :-)
I don't know if I didn't understood dell's requirements or something but why do dell want everything in a microservice? like can't they make a monolithic application and have some microservice for components they they need.
Isn't 99.9% uptime across 200 calls the same as .999^(200) = 81% uptime for all 200 calls taken together? Like the odds of one call failing is 19%? I don't think it's even close to 100%?
it's the chance of succeeding 200 times in a row with 99.9% success chance for each
even a single failure will cause complete failure
the chance of success I got was 1-(0.999)^200 = 0.18 or 18% chance to succeed
good talk, nice to see some retrospectives. ALso the check for does this microservice have a business entity to represent is a good point to look if it is needed or will ever be finished.
Savage! would love to know who the original architect was and what CXO position he/she holds now.
Great presentation, thanks
What is the full title/author of the "DDD" book mentioned in description of Bounded Context about 21 minutes into the talk? I am guessing it is Domain Driven Design by Evans Eric. Is this correct?
Vaughn Vernon
I dont understand the "solution" at 30:45
instead of removing dependency, he's only reducing it to "once a day", with the added notion "or more, if required".... that just seems so silly to me, ???
Whose ear is it at 26:00?
Well if we just take a look at a computer, we'll see that computers basically are designed for data duplication.
If you execute an application stored on a hard drive, parts of it will be copied to the ram and then smaller parts will be copied to the cpu caches. One single piece of code had to travel through 3 different storage systems in order to be executed, the same data had 3 replicas in the end. As long as it has a reason and is done well data duplication is not a bad thing.
Isn't the whole app basically a webshop?
Great stuff and nicely conveyed!!
If we not supposed to do it, why does the feature exists?
So the story is really an implementation of a distributed mainframe, they just did the same as they did in the mainframe, but with API calls. I see this approach also with no-SQL technologies, where databases are implemented as if they were relational. The whole project conception is wrong, but hey, people got paid :)
How does 'Search polls Catalog' (or 'Catalog pushes to Search') mean that Catalog is dependent on Search? In what way is that dependency inversion (which I assume means you take the converse of Search depends on Catalog)? This implies that the search service is dependent on the customer's laptop because that's where Search's information is going. What a perversion of the word dependency
Somehow I have a feeling the company name is Dell)
are u sure?
Very informative but there's something I didn't get.
When reversing the arrows, does that mean the "search" microservice has a sort of a crontab querying the other services once a day to get data ? Because he drew a "Denormalizer" outside of the search microservice, but what is this in concrete terms ? It feels like it should be 100% specific to the microservice's database it's inserting data into.
Yeah in the example he said they polled once a day, but what you really want to do is to use domain events to update the data of the search. so they are as close to live as possible.
@@marcodoe4690 Sagas then, which eventually leads to event sourcing
As described, the architecture doesn't actually "reverse the arrows." Instead, they decoupled the arrows temporally -- they moved it to background jobs. Actually "reversing the arrows" would be publishing an "update search" interface that the other services implemented: when a new model is added, an event is pushed to update the search, and so on. That's the biggest flaw in this presentation.
This guy says it like it is in the crazy realm of my corporetocracy:
"There are people who invent a problem and then create a team that specializes in mitigating it without ever solving it completely so that we these people can further their carreers at the expense of everyone else and the company as a whole"
Truer words have never been spoken, yet, most companies would door-slam this guy because he's being "negative". All the better for guys like me for I would hire him in a heartbeat ha!
- He knows his craft in depth, loves and cherishes it and shares his knowledge.
- He's the benevolent dictator that doesn't mess around with computer science and knows when and how to set a healthy boundary when technical debt appears like cancer in an early stage.
Thumbs up and hats off for keeping software engineers well-employed and happy in challenging projects man!
Thanks for sharing!
laughed so hard when he said : "...and... nothing showed up"
When you read a blueprint upside down and backwards then try to build it.
Weird or not, I am trying to be better. To be better is what I try.
Data ready to be fetched is prob the safest way. I was trying to figure out how to do a crud app but then was made away how api keys in source code as being bad practice so it has to be in an .env but couldn't still figure out how to get it to work as it kept showing up as red in Android Studio Kotlin then researched and saw how .apks can still be reverse engineered to show the keys. Best way then is just to get information from somewhere else more secure and managed but I'm quite new so there could be better ways. It doesn't take very many steps to make a proper json call lmao
Bude, you're getting a Bell!
I was thinking Dell too.
Great talk and perfect point in the end: IT is never a standalone thing it must be considered as merely a reflection of the actual processes. But I think it would've been nicer not to name the company and keep it anonymous.
this is about Dell isn't it ? the same Dell whose indian tech support would tell me that I should not keep a laptop on my lap because it's designed to sit on a desk or it will not function correctly ?
Actually he was right xD
Laptops weren'tdesigned to function properly on your lap. Ikr? Lol.
Conway's Law, Jimmy's Law... you completely forgot Cole's Law.
Amazing oversight...
Reminder to self when reading the DDD book: Jimmy says (20.30) to start at the part that's about Bounded Context, and ignore the rest
Did you just reinvent CQRS?
but the yellow box at the end is still a new microservice lol. 1 microservice to rule them all
yes it's 1 new microservice that doesn't call the other systems all the time and is self contained to do 1 thing well: search. so they did a good job on this.
Great talk
Just brilliant!
Nice talk :D Though on the probability front their uptime would've been 100 * (0.999 ^ 200) ~= 82%
He made it crystal clear the calls weren't independent
Yeah and they went from paper to electronic to prevent stale data.. because they were loosing money on low prices..
The analogy fails.. but I get his drift..
We’ve tried the data replicate and it’s stable but you fight missing updates.. needs good doc dB that is not hindered be reindexing and searcher reloads..
Super thumbs up Jimmy !
I like the parts where he's talking about software more than when he's dunking on other developers, corporate that they are. The hypothesis that the system is a copy of the organization says that it's not their fault entirely.
It seems that his team is dysfunctional, and he is just blaming the applications architecture (monolithic or microserves). He pointed it out the project manager left earlier before he get fired, typical of a rotten project manager. A good team can make things work in monolithic or in microservices, while a bad team can make spagetti-microservices as he described them in horror.
Well it was the obsession with the fashionable architecture that pushed those forces. Also, I'd say a good team is one that communicates well. That's probably 80% of it. The x architecture shouldn't be the aim. The architecture should reveal itself naturally as a good team of people try to solve specific problems.