Learn more about Acceptance Testing and Behaviour Driven Development with my on-line course "ATTD & BDD: From Stories to Executable Specifications". Find out more about my collection of Acceptance Testing courses HERE ➡ courses.cd.training/pages/about-atdd
The common in-practice use of “end to end” testing is quite different today than manual testing in a staging environment. It more often than not includes automated testing rather than manual testing these days. It can still be brittle and slow but you do get quite a bit more coverage if you can stomach the wait times.
Conclusion: the man does not know anything about testing as he doesn't know about software engineering in general. See the last comment I posted. I really should open a channel to debunk all this bullshit.
@@biomorphic yeah. Lots of people who speak a lot about software development tend to want to think they have useful things to teach more than actually knowing what they're talking about. I watch stuff like this anyway for the occasion where they have something correct to say that I didn't already know. More often, I just learn more about the latest buzz words in software development.
What I usually mean when I say E2E is a Puppeteer, Cypress, Selenium, etc type test that does things like "When Button A is Clicked by a User with Conditions X then API M is called with the correct parameters and the results will reflect the conditions." And similar things like "When API M responds with a error B we give the user a popup that says sorry the service is down." Stuff like that. For distributed systems you really need tracing of some kind plus some amount of unit testing or formal verification on the individual components.
Well he makes exactly case against that with the contract testing. Even for frontend test, you can have fake mock up of your backend or backends, and just test against that. That would form the frontend contract with backend. Then in backend you test the contract matches the reality.
@@lukasdolezal8245 I mean yes in general you use a mocked backend. You spin up a mock backend in a docker container or otherwise in a virtual environment somewhere and then run against that. If you have to run an E2E like that in a live environment for every environment even for just 2 environments that's going to be time consuming and brittle.
@@lukasdolezal8245 There are several fields that would call insanity to not do any end to end test. Would you deploy a telephony system without doing a single real call before delivery? It is not only deemed stupid, regulatory agencies demand register and proofs of a large set of REAL conditions tests . You may test every single part of your system and each signal degardation step may be within parameters but when you put everythign together the interaction along those degradations creates an unusable result. So it is common practive to automatically generate a large number of calls on a large number of routing situations and compare the entropy between the signals in the input and in the output.
@@GoodVolition I think instances like this are better solved by stubbing your backend interface. It's faster to run, less work and your devs don't need to run another service to run the tests. It also reduces the headaches of having to mock authentication.
I know a place for which E2E testing works relatively well. It works relatively well as a temporary testing suite made for a legacy monster of a system before approaching to refactor that and make it more testable and add unit tests. It gets terrible when managers see that we have test coverage now, slap themself a high five, and stop not giving time and budget to refactorize that legacy codebase.
That is specially true because in a legacy system you not always knwo what is expected at each point or component output, but you know with a larger degree of certainty what is expected at the end of a LARGER group of components or system.
What a breath of fresh air. Good thing I found your channel. I was shocked how correct my professors at college were when they told me how often companies don't follow good practices, and often end up with a jumbled up system. They killed my passion, but I am in the middle of changing jobs. I hope the new place is better.
I like a lot of the ideas in this channel. However, I think maybe a lot of them can only be achieved in rare perfect conditions where you have for example a.) Complete control of development related activities in an organisation b.) A team of top notch developers who all generally get on well with each other c.) People focused more on what's best for the team rather than their own career. In the real world though things are a big mess and you might have better luck herding cats.
I'm looking forward to your practical tips for how to better navigate our imperfect world; you clearly see aspects that can be better handled some other way, and the way we should be handling them.
choosing where to work based on the alignment with your values is an important part of looking for any job. Even if it doesn't fully align, you have every opportunity to present your case to other developers to win them over - I spend a lot of time trying to win hearts and minds, especially with some of these 'radical' ideas. At the end of the day your results will speak for themselves and you might get more buy in than you think. It's not always a lost cause, some people just don't know anything else
Tim, interesting that you focused on the social side. Our issue is more that we have very few elements where the interfaces can be cleanly separated. Just a few and those are not the areas where we have problems.
I would have liked more information on the cost to build and maintain Dave's "test rig". By the looks of it, he's build entire fakes, but the trouble (aka, effort) with fakes is that they need their own unit tests to ensure they behave correctly. I do e2e testing on my REST API's before I deploy, but the effort of keeping my test rig data maintained is driving me nuts.
In my experience, whenever I rely on mocking systems, the tests all pass, they are durable, everything is great... and yet in the real production system, things are breaking. The tests just adapt to the mocks over time, and so we aren't testing the real thing. Then, we have to increase complexity of the mocks, until... they are approaching the complexity of the app itself! It seems hopeless, so I just do e2e and do my best to still isolate things. Tough, very hard, automated testing is harder than building almost anything else. Still, good point of view and glad to hear it, thanks!
The commonest mistakes that I see in this is not abstracting enough, or appropriately at these boundaries that you mock. How can the mocks be wrong, different to production, if you define a contract and mock the contract, and then production behaviour differs, then by definition, your contract isn't good enough. Practically the mistake that people often make is mock at technical boundaries, even mocking 3rd party interfaces, I think that is a mistake, because those APIs are big complex and general. Abstract the conversation with your own layer of code, so that it represents the simplest version of the conversation between these pieces, don't allow any routes of conversation outside of these interfaces, now you can mock those conversations with more confidence.
1000% -- It's refreshing to see someone discuss the things that have been on my mind for quite a while, and it's very difficult to get some people in organizations to let go of the false sense of security that large complex e2e or integration tests provide. They call it "quality", while it's anything but quality. What's more frustrating is that resistance is often combined with arrogance and unwillingness to learn. Actually I'd venture to say it's the opposite of quality, because you are putting confidence in a methodology that has simply proven to be inadequate for testing large complicated systems -- especially today where each component may have connections and talk to multiple others, and so on. Also, your videos have the best animations!
It’s sad to be in this situation, people have difficulty to focus on problems that real exists, they prefer to blindly follow standards like test pyramid. So they follow standards and have a false sense of confidence instead of focus on the problem and think on what is the simplest thing that could be done to prevent the problem to happen.
Excellent topic but could not disagree any more completely as regards to NOT doing E2E Testing!! My main point of disagreement is your support of an alphabet soup of automated tests (Unit Test, Integration Tests, TDD, BDD, etc...) that are run by dev/qa testers, is all you need with fake/mock end points as required in order to do end to end testing. I have been developing an algorithmic options trading system for the past 5 years now and the main end to end testing I do is always from a complete business workflow point of view as submitted by the end user which I do AFTER I have completed all of those automated tests that you mentioned of which I am in complete agreement with. A single end to end test for my algo trading systems is to: create all required reference data, open a trade , monitor the trade in the app with real time market data and close the trade exactly like I would do in a production system. No faking of any api's with mocks but definitely use third party testing resources as required with the only difference being configuration requirement changes as compared to production. Any data that is like production should be clearly defined as test data and this needs to be designed in from the get go. In this way, end to end testing should be very simple and business user workflow oriented wise, completely exhaustive to verify at a minimum that the applications meets the business requirements from developer point of view. Once this is met, then you can move onto business User Acceptance testing and production Pilot testing with clear idea of successfully delivering a product that was clearly defined by the business. At least for me, E2E testing should verify your business transactions in real time and how the system manages any error situations that could arise in production.
I think the key thing is just test at many levels, and test the right things at each level. Basically the testing pyramid. Unit tests have a place, integration tests have a place, contract tests have a place and even end to end tests have a place, but don't use end to end tests to test things you can test at a lower level. For example, your end to end test might just be a simple happy path that you can use during canary or blue/green, which can do things that contract tests can't, like verify the infrastructure and services together.
This, your answers puts it much better . There are things you NEED E2E tests, so do it for those. Just try to use smaller scale for things that can be caught on that scale. A simple example is system performance, e2E testign is the only one that will give you a REALISTIC measure of the performance on a system that is asynchronous, parallel and with several dozen stages.
It depends where you think the "end" is. If I am testing the performance of a financial exchange, I don't need to also test the performance of clearing and user-registration. So those external systems can be faked for most of my performance testing.
@@ContinuousDelivery The further you move "end" to the users interface, the better you find actual problems. Your financial exchange might take 2 seconds, but shown in limbo for minutes due to aggressive caching in the frontend.
100% it's a shame that Dave chose to use the inflammatory "don't do end2end testing" rather than the more advisable - test appropriately at each level of integration.
Not what I expected to hear, I missed the fact that E2E tests are very valuable for monitoring. You don't want your users to detect that the checkout process is broken, you should detect that early with E2E tests. Run them continuously on all environments, even production, to validate business critical paths work. Of course E2E test fail from time to time. That's expected. Software updates on the users side or API changes of external systems happen all the time. It's our job to build a software that is resilient and keeps working. And if it doesn't, we should be informed swiftly to look into it. In the end, neither your unit or contract tests matter if you don't recognize that your app is broken for end-users due to a browser update or a tiny change in any external system you don't control.
@@-Jason-L Should, but you cannot! If you are making a website that sells tshirts you can risk it, but if you are making systems where a single minute of downtime can cost millions or worse, LIVES, then you CANNOT TRUST ANYONE word.
You really are supposed to have the least amount of tests in E2E, instead many developers argue to put everything in E2E because the code sucks and "UTC is hard." It's very frustrating. If you actually have good code and good unit tests, E2E replaces simple happy-case sanity on app, it's just sanity. It's a lot of work to get E2E right (don't mock the backend, that makes no sense), would suggest to literally take a hour or two and do that sanity manually. Manual sanity is needed anyway. Most companies can't coordinate between conflicting features, the AC is correct for everyone but the user. Seriously, E2E is not the answer.
@@johnjackson6262 It is not the answer, but NOTHING is the answer anywhere in software development. Everything are tools and tools are to be used when they are deemed the correct tool for that specific moment. The simplest example is ensuring the performance of the system is still within acceptable parameters. You can only be sure about the performance in an asynchronous parallel system when you run the system in a complete state.
it's your responsibility to make the organisation less dysfunctional (you are also the organisation). You are doing the work, you know more than the org on how to best to test your code. Developers have more power than you think to drive change, especially in dev practices.. start using it
@@k0zakini0 In general I agree. I have tried to do that to the extent possible. Usually I find myself as a newcomer in an organization that is already large and mature. It's much harder to move the needle in situations like that. Oh how I would love to lead an organization to the Promised Land, but sadly, that just never seems to be in the cards. That's why I work on my own startup.
@@k0zakini0 they are still employees they don't really have a say especially in things which could add short term costs to production. This is the reality of our economic system.
I recently started implementing Cypress into my teams app out of my own volition, and luckily this is an endeavor the company would like to move forward with. Doing it really showed me the value in system tests where we can go over every single circumstance, whereas our e2e tests only view the very limited happy path. E2E really started looking completely useless to me outside of accessibility testing and the most basic of pipeline checks for PRs.
Dave, just wanted to note that the thumbnails are starting to be touching cringe. As much as I understand the "nature of YT and click-baits" but for god sake, there is a limit. We get here because of your competence and level of delivery - not because of weird click-baity cringe thumbnail showing some exaggerated and fake reactions/emotions.
Unfortunately the slightly click-baity titles make a huge difference to the reach of the videos. I would like it if we could do more straight-forward titles and thumbnails, but if I want my videos to be watched, and I do, then that is simply not how UA-cam works these days.
@@ContinuousDelivery Thanks for the reply. Turns out I will need to close my eyes next time and make a click. The chasm between the style of videos (which are professional and to the point from the first second) and thumbnails is so large my brain is getting fried. The only thing left is to cry about the future of humanity as species if we need to do things like that to educate people.
We replace the word « system » by « object » and it becomes the definition of a unit test. Unit tests, integration tests, acceptance tests and e2e tests all look the same. A controlled input to assess expected output. The only differences I see are the relative scale and the tools used to control inputs and assess outputs. What am I missing? We can easily imagine acceptance tests of a library/package barely being an integration test of another more complex system. The same way we could imagine unit testing a group of external systems from a system substantially bigger. Like if someone decided to test the internet. Maybe there is a system where we unit test the internet
The "only" difference is relative scale. But that is on every metric. It increases chances of false positives and false negatives, it increases the burden when setting up your controlled input, and it increases the burden when filtering through your output. You may as well say that a unit test is the same as measuring the electrical signals coming across your ethernet cable - the only difference is the scale of the data you have to send and sift through, and there's lots of inconsequential changes that could cause a test (or heck, all of them) to fail. Or taken to the extreme, the input and output is every atom in the universe, we simply have to control the state of the whole universe before running our experiment then measure the state of the whole universe after it is done.
I think this depends a lot on the domain. My background is in air traffic management systems where it is essential to test in isolation using simulators, which are themselves complex systems requiring their own testing (but can double up as a training tool for end-users). This is in no way a substitute for E2E testing though, which can, and probably will, reveal issues that escaped all prior testing. The importance of this level of testing is such that a new system will often shadow a pre-existing system for some time before becoming operational. This is obviously the most expensive time to discover an issue, so if there is any way of incorporating some degree of E2E testing earlier in the project it could be very valuable.
Well in reality that end-to-end test is going to happen one way or another. You just have to decide whether you want to do it before the customer does.
Your proposal is ideal but requires every layer to be on board. If you have a system of 15 layers and 4 layers have little to no tests then writing a single all layer e2e test is more cost effective than writing 4 tests and will still prove that the whole thing can be shipped. In this way full e2e are used as a stop gap. 14:40 I think you mean that the API can't be trusted and the solution is to better specify your APIs which I agree with. If you mean the network can't be trusted I don't think there's anything you can do about that.
For software engineering this might be the correct approach. In data engineering and big data I feel we must do end to end testing because I have to test all of the system under stress
One director I worked for a time ago seemed to be fundamentally against anything that streamlined the development cycle, because he thought if it was too easy then his project would attract lazy developers. I found that mindset quite discouraging.
I like to explain the fundamental problem of testing in general to teammates with a concept of "feedback time". Aka the time between a mistake is made and the developer learns about it. And the goal is to constantly minimise this feedback time across the organisation. Contract tests and fakes at the interfaces are certainly very good tools to achieve smaller feedback time, when it comes to more complex systems.
@@ContinuousDelivery How would I answer the objection from a co-worker: "Certifying on mocks or integration tests would not be sufficient and that would lead to more bugs surfacing in production. For such a complex product with multiple systems working together, it's best to test as close to prod as possible i.e. prod or an env which is a close replica. Without that, there is going to be very little confidence in what we certify."
Yet that is not enough. For a client the product is the interaction of the code with the environment. So systems that operate in complex scenarios that cannot be isolated (telecom for example) are usually in need of E2E testing in live scenario to evaluate extraneous factors.
@@tiagodagostini I am pretty sure that was the point of @Victor Martinez - pretty sure it should be read as "That being said, tests are experiments..."
I love this channel , but i completely disagree with what you're saying at 9:35 - "we want to deploy our system" - NO , we want the whole thing to work, if we are on PROD , but the whole scenario is not working and customers cannot use it , what is the point ? Integration is the most challenging part in nowadays, but this way of thinking put on risk the whole idea - we are on prod (system B), we don't care about system A - if they are not on PROD it's not our problem.. Yes, E2E is time consuming and so on, everything that you said - i agree, but the proposed alternative is not working as well..
E2E tests are pretty useful before rewrites / refactors. You scaffold your app with tests, and then you have more confidence going into chanhing parts without breaking things. Maintaining constantly e2e tests however… usually just yet another useless layer to keep engineers busy…
Nowadays when I think of e2e testing, I think automated.... like Cypress. So far a much more enjoyable TDD experience compared to any web unit testing I've done, so much more efficient than manual testing. Manual staging environment testing is very expensive and I've only seen it lead to arguments between the testers and devs.
It may seem like a disadvantage that your e2e testing environment is dependent on other teams software that is in flux, but if you don't test the interactions of different systems that are changing somewhere, you'll be testing it in prod. I've recently had another system my own depends on change a response to an api call in qa without ever informing us. Our e2e environment tests caught this, and we discussed it with them and got in synch again. If we'd just faked their responses out, then the issue would have appeared first in production. I'm not sure its as clear cut as this necessarily. In environments with a lot of different teams/services that all interact things will be messy any way you shake it. There can be something to be said for not trusting other teams to communicate perfectly regarding changes to their systems that effect you. I would rather things be messy in testing than have to point fingers and pull up receipts after a live incident.
Watch the last part again. The trick is to utilize contract tests to always be sure your assumptions about the other system are correct. The whole thing is to be able to run the tests of your own software as fast as you can, while retaining the same confidence (and more) in how it interacts with other systems
I think in an ideal world, most people would prefer to use contract testing & something like WireMock to have faster running tests & to just always trust that collaborators will always work as we expect them to. Unfortunately in a more realistic world, I think E2E tests that exercise real collaborators DO provide value despite their overhead & brittleness. Almost as a form of a smoke test for critical paths before a deployment which is why it also probably makes sense to keep them low in number, so less to maintain. Most workplaces IMO would probably benefit from supplementing their unit tests with both the types of tests described in the video (higher level, black box tests with mocked collaborators) AND the more traditional E2E type to provide higher levels of confidence, versus completely ditching traditional E2E tests. So like most things in software, to do E2E or not to do E2E... it probably depends 🤷♂️
As I say in the video, I think that the best scope for testing is "a releasable unit of SW" and you test that in a deployment pipeline all the time, so there is no need for smoke tests, because you test the whole thing properly, but then, that is the system that you are responsible for - all of it, shared code ownership and all. You can, and should, do this as a collection of small teams. But scope of release and acceptance testing is a releasable unit.
My own experience of owning a lowcode integration platform where we have different APIs (about 50-60 different vendors) we integrate with, suggests that both approaches have validity. Wiremock is a great tool and something we use extensively. This provides confidence that the captured state (which the test uses) can be tested. However, at the same time nothing beats testing with the real API. We've had a few instances where something subtle has changed on the vendor's API AND bang! I honestly doubt that a stubbed test would have caught these instances, whereas a test against the actual service MAY have 🤷♀? The level of testing is of course driven by the value of the API/connector (some have much more use than others) and a feeling....if we think the vendor's API is going to suck/be brittle/change without notice then we try to add a test to attempt to catch that.
@@jimiscott yep, what you're saying heavily mimics one of the problems my most recent place had. Lots of WireMock (great tool but won't - as you said - cover all scenarios) but nothing that catches the BANG scenario you described! Agree that the right answer is to probably run with a mixture of both types of tests as they cover us for different things.
@@ContinuousDelivery "A releasable unit of SW" - Ideally consists of "All" the Systems -> not just System B - You are going against the Agile's feature team concept here - A team is responsible for delivering a value to the end user - Not just a front-end or a back-end. If you are only responsible for the "Front-end", you may use the contract testing - but then the team is not delivering any value to the end-user - they are not delivering any User story - just a bunch of code useful for nothing
@Marck1122 If multiple releasable units only together constitute a new release, they are not multiple releasable units and should be treated as a whole. It is better to separate them up to allow individual progress. A single unit should be releasable independently, and the only thing you can properly test for in that case is the unit itself. Dave here promotes a mix of contract tests to validate interaction with other units, and simulated acceptance tests to actually test all possibilities (impossible with the actual service)
Not sure if the recommendation is to convert e2e into Integration Testing as much as possible? Maybe better advice is to minimize e2e tests and capture max value through other forms of tests. Not doing e2e tests is bad advise.
I've generally found that mocking system interactions for testing doubles the workload placed on developers. In a perfect world everyone shares their test cases, but even then it's much cheaper to have people write proper documentation and public interfaces - if they can't do the latter, they definitely can't do the former. Imagine if all consumers wrote tests to submit to all software libraries that they used to inform the publisher about how they were using the library.
I have to say, after listening to this and creating unit tests & integration tests for each system and E2E test to test systems working as expected. I realise that my e2e tests actually test high coupling between the systems. Always good to listen to these videos, even if I don't agree with some of the points :)
Your description of "system tests" seems more like integration tests to me. The SUT is put in a simulated environment and expected to "integrate" (work with) the connection points presented to it. System testing -- actually testing the whole system -- should test only those aspects of the system that can't be duplicated in a test environment. A full system test basically should be a combination of functional testing (does the whole system work?) and school of hard knocks regression testing (Bob forgot to configure this once, so I guess we have to test for it now). System testing should strive to be minimal and lightweight. Too much system testing results in the bad system you describe: a ponderous final test environment that leads to too much surprise late in the testing cycle and makes releases more difficult than they should be.
Specially because in several fields the clients DEMAND proof of end to end testing. When someone says do not do what your clients DEMAND, there is something wrong.
@@tiagodagostini If someone would pay you to run a marathon but damands that you balance a pineapple on your head the whole time, you'd propably tell them to get lost.
@@grrr_lef well if I have no time limit and the pay is really good I might try it :P Also more on the subject.. you cannot tell your clients to get lost. That is not how successfull companies work, besides what holywood might think .
15:55 “for each we faked it when acceptance testing … our stub that represents an external third-party system … from the perspective of our system under test it doesn't know that this is all faked” Likely my confirmation bias but I'm noticing the absence of any term starting or ending with “mock”. 17:07 “we ran our own contract tests against each end point usually the the endpoints beta test site … we sometimes had contract tests that would fail against their beta site and we'd change our integration with it once we understood the change. We'd do this by first changing our simulation in our tests [i.e. test double]“ The “drifting test doubles problem”. It may seem like a lot of additional (testing) effort but those tests capture “our understanding of how the RealThing™ works” so there is value of having the feedback that it's no longer accurate.
How do we know if the dependent system breaks our contracts if you are always testing against mocks ? I’ve seen the approach of spinning up all dependent services in the docker network with testcontainers. So you always test against the real versions. If a system B that your system depends on has a new version released, you just bump the version to which you are testing against. That way you functional/acceptance tests in the pipeline will give you faster feedback. But I’ve also seen that not working if system B depends on C that depends on D… and so on. Then it either becomes a big ball of mud, or you start to mix wiremocks and real versions of services, which I still find reasonable compared to mocking everything.
Contract tests validate services are returning what the clients expect. Integration tests validate that systems are actually calling each other as expected. So you are not always testing only with mocks.
I think the title is incorrect. There is nothing wrong with E2E testing if you automate it, and results are evaluated automatically. In my company we do massive E2E load test, to precipitate any problem much faster than any customer can. The application is subjected to this test after any kind of changes, every time. We fire 1000s of transactions simultaneously, doing every possible operation non-stop for an hour. For major changes, we run for 24 hours non-stop. If nothing breaks, nothing leaks, then it can move to release state.
I'd agree. I think how much E2E testing you can/should do - or even what the term means - depends on many factors. We certainly should not depend on E2E testing only, and no test - no matter what kind - can guarantee 100% correctness, but doing no E2E testing at all? Good luck.
Big ball of mud, exactly what I am dealing with, it is so difficult to get a hold of the state of the application. Every test returns a slightly different result than the one before. Extremely difficult to test.
Basically no organizations achieve the ideal. When we evaluate idealistic recommendations like this one, we have to question: “if we don’t make it all the way, is it still valuable?” The answer here is unequivocally no. E2E tests are plainly, clearly valuable all the way until you achieve your contract perfection, which for practically everyone is never. This is not a good recommendation, because it’s all or nothing.
I've developped an internal CI/CD pipeline for a few years, and h'es mistaken: CD is not what "makes sure the software is in the deliverable state", because that's actually the goal of CI. In contrast, CD takes a version that passed CI and automatically deploys it into production.
All kinds of tests are useful as long you know when and how to apply them. End-to-end testing might be painful but sometimes crucial and unavoidable: for example we can use end-to-end tests when developing "minimum viable prototypes" to test in practice whether the basic assumptions for the grand architecture truly hold so that we can decide early on whether its worth investing further on a system or whether it's doomed to fail. All kinds of tests have their place in the grand scheme of things.
I feel like Dave mainly describes API-based systems where it is all stays true. The UI End to End testing is only fragile because you are using unit-testing approach to write end-to-end tests. @testrigor allows you to write test 100% purely from end-user's perspective with zero reliance on details of implementation. This way some of our customers run tens of thousands of end-to-end flows with 0 flaky issues many times a day.
We are currently on the go to implement automated e2e testing. Namely to test regression of stabilized API's. I'm already concerned about the set up n clean up needed. When test app will need to need and clean up after each test. Fine when works, but introduces potential fail state during the test, which can be automatically cleaned up on hang up state, but that's additional complexity. Potentially our QA engineers might be stressed due to the complexity involved of maintenance and require additional dev time. However the promise is that regressions could be tested by pressing the button. Namely regressions are not part of the pipeline, they just help testers to run through previous behavior while allowing to test manually the new behavior. Yeah yeah, manual testing I know, it sucks, but we trying to do the best we can with what we got. However, Dave here gave me pause for thought. Not sure if I got it right regarding contract testing between systems. But.. technically it would be possible that system A exposes it's part of the contract. And system B fetches the contract input and runs the test with it, testing the output or state change? Hence each team maintain their own set of contracts and don't care about the rest. Maybe sounds over complicated too, and what Dave means is just "unit testing" in the scope of deployable unit without outward communication what so ever. But that scares me given that such tests may give false confidence, as it relies on people communication at ad hoc times. Which seems to break whole determinism promise which will only pop up in prod env. Anyone care to explain in more detail maybe? What I got wrong?
I agree with you, but only, if we are talking about a perfect world where the company has a very and well-developed culture of testing, QA, and CI/CD runs in a perfect scenario. For all other realities, E2E testing is a necessity. I really wish that things work so well that we could just drop using E2E tests, I truly agree with your arguments, but most companies are not yet ready.
I like the idea that we can use a range of different types of test, and through all of them we will gain confidence in the system. Basically if you only ever tested each piece of the system on its own, i'd not feel confident releasing that without at least some sort of e2e testing that tests that they all gel together properly. Still very interesting points made in the video here, I especially like the discussion about how systems a,b,c relate to each other test-wise, i suppose this is where the ideas in mockito might come from One cool thing when it comes to e2e testing is the idea that if you carry out a handful of business scenarios that together cover all of the pieces in the system, and each one has several tests to go with it, if they all pass then its unlikely there's something seriously wrong, as each piece would've had to do its job successfully when interacting with other pieces in order to produce the right result multiple times. it is always possible that you've got bugs that don't affect the result, or multiple bugs cancelling each other out, but this gets progressively less likely with each test scenario.
you sound like you work in a perfect world where there's tons of time to write and test every aspect of the software. the rest of us who works under the gun, limited resources don't have your luxury to do that.
So its --- Write proper APIs, write proper clients, and proper mocks on how the system acts when sent certain inputs --- then share it with the other teams, so all one needs to do is use them properly while developing system-under-test (what to mock, and what not to mock). Please tell me if I got this right.
I already have been in a scenario where the tests have been skipped as a management decision, while also a lot of pressure has been put on the (mostly inexperienced) devs. Of course it failed quite spectacular. One of the things that was done while developing was to at least implement a lot of unit testing. It was crude and partially more creative than really useful, but it provided a quite high coverage around 90%. One of the worst bugs was a saving process which triggered multiple processes via the frontend, which in the end used the same storage, resulting in them overriding each other's data. The individual processes all had perfect coverage, but they haven't been tested together, which lead to the bug reaching production with quite terrible consequences. A very simple e2e would have catched this immediately (and of course we implemented one, including a others). If you build a proper pipeline it should not be a problem to implement an automated testing stage that at least simulates a user going through the critical paths to make sure that there is no oversight in the combination of some new features. If anv bug is found which is the result of one of these combinations, you should also add one more of those tests to make sure that this will never happen again.
E2E is not complicated - It is what the real customer or the end user is going to do when they use the product. E2E is projected as complicated by vested interests -aka companies that don't want to spend money on testing and lazy developers who don't want to take accountability for their bugs. I have seen enough of crappy apps and websites in the last 10 years where even the basic scenarios are failing but I'm sure the development teams had "contract tests" like the ones mentioned in the video that are passing and the pipeline was "Green". In such scenarios, I have seen the development teams don't even want to take accountability for the Issues and simply pass on the blame because they are not responsible for the full system. Over Engineering is killing Software quality and the snake oil salesmen are selling "don't spend money on testing" ideas to the companies that can ultimately lead to loss of business.
> One of the worst bugs was a saving process which triggered multiple processes via the frontend, which in the end used the same storage, resulting in them overriding each other's data. It's a hilarious example of unit tests blindess.
Interestingly enough, this is a debate we're having at my work now. The problem is, it doesn't matter how tightly you control for you're upstream dependency's when they change on you without letting you know. This happened between 2 of our teams that resulted in a bug that reached production and caused a feedback loop when purchasing under certain circumstances and resulted in customers being charged multiple times for a single product. There's been some back and forth on how much emphasis we put on tests pull several levers to check for regressions.
Lets say the user knows about the full system only and has no knowledge about A, B, C. If technically possible to test the full system, Why would you spend time on these systems individually instead of the full? And why should they do more than they need to? I understand there are technical issues usually but just theoretically. Maybe A, B, C is just because of legacy or bad design. Maybe it should next year be just A….or A, B, C, D, E, F, G…. Point is, different levels need different approaches and strategies. Too easy to say that E2E should go, but ofc, selfish engineers that ”own” B would love it 👌😅
@@carlbergfeldt818 E2E is typically way more complicated to do than smaller scale testing, and tends to steal resources from more targeted testing. That combination tends to lead to deficiencies that aren't noticed - which pop up later as really difficult to troubleshoot bugs. You have way more visibility at the boundaries of the individual components/systems. It also tends to lead to undocumented changes in the specifications of the interfaces/boundaries because teams are trying to 'just make it work' rather than pushing back on the team that actually introduced the bug.
@@rich63113 E2E is not complicated - It is what the real customer or the end user is going to do when they use the product. E2E is projected as complicated by vested interests -aka companies that don't want to spend money on testing and lazy developers who don't want to take accountability for their bugs. I have seen enough of crappy apps and websites in the last 10 years where even the basic scenarios are failing but I'm sure the development teams had "contract tests" like the ones mentioned in the video that are passing and the pipeline was "Green". In such scenarios, I have seen the development teams don't even want to take accountability for the Issues and simply pass on the blame because they are not responsible for the full system. Over Engineering is killing Software quality and the snake oil salesmen are selling "don't spend money on testing" ideas to the companies that can ultimately lead to loss of business.
In the test case with invoice rejection, I would add an assetFail() call after the line, which is supposed to throw an exception. This way I make sure test fails if no exception is thrown.
E2E tests find incorrect assumptions between parts of the system. I would agree that you don't want more than a few E2E tests because if you haven't got a grip on interface contracts, E2E tests will be fragile and can become a serious speed bump until the revealed problems are addressed.
I'm not sure how we can implement ATDD in a microservices architecture without essentially conducting end-to-end tests. If our acceptance tests require interactions between all our microservices and the frontend, doesn't that mean we are essentially performing end-to-end testing?
I try to limit "end to end" testing the way you talked about it here to simple smoke tests. somewhat questionable in terms of utility at times, but they do catch if something goes super wrong in deployment. The only huge drawback is if it fails it tends to be more difficult to debug, and it almost always indicates a test gap somewhere more fine-grained.
I would say it depends. In a monolithical old style application, it is easy to demonstrate the benefits of an isolated fine grained testing like unit testing. But let's talk about distributed micro service based architecture. The micro services themselves (if they are "microservices") are small pieces of code that have a well defined , unique, and very limited business purpose. Testing those e2e could be reasonable. Now let's take it further and suppose you're using something like Flink for your streaming implementation. How do you test this kind of application? You cannot test your "streaming logic" outside of flink since it is pretty much implemented in Flink's terms. All the 'filter', 'map', 'keyBy', 'windowing'... you name it. You have no other choice but to execute a flink application e2e in order to see the results and validate them. I talked about this issue with Flink's maintainers and they have agreed that Flink's architecture does not allow you to isolate the definition of a stream from the underlying engine. This is a huge drawback of their architecture.
I work in a heavily regulated environment. While we do test every piece independently we also deploy to a test environment that is a clone of production and a group of people has to manually check to make sure everything is correct with a well documented test script to follow. With lives on the line for making a mistake the process is slow to prevent any accidents.
Unfortunately the data say the opposite. By going slower, the result is lower-quality software, not higher. See the State of DevOps reports, and read the Accelerate book. We have a better way to build mission-critical, and safety-critical systems now. That is being adopted in medical devices, military devices, and space-craft as well as many others.
@@ContinuousDelivery Somehow we need to convince the regulators of that. We do test every piece independently with automated test suites and normal CI/CD systems. It is just that at some point a decision is made to release to production and all of that stuff is moved to a test server. We already know all the tests have passed but people need to go through and check many things manually before it can be released to production.
@@Immudzen I have seen that done too in various forms of finance systems and for medical devices. It is my argument that you can't be really compliant without this way of working. I outline some of those ideas in this blog post on "Continuous Compliance": www.davefarley.net/?p=285
@@Immudzen It seems to me that what you describe is an integration test. That is, you did test the "pieces" independently, but that doesn't guarantee successful operation of the entire construct (in case some interpretation differences at interfaces). So to ask for an integration test seems reasonable. Now of course the interesting question is, did you ever had failed test in the "test server" phase?
@@ContinuousDelivery lol - this is why more and more "Medical devices recalls" are due to Software failures - you guys are playing with people's lives - instead of asking the damn companies to spend more money on testing, you people are selling the snake oil that "less testing" is good etc. Are the developers competent enough to define the "contracts" very well so that there will not be more bugs ? Testing is the last place to catch the Issues before they reach production. But you people are blaming the E2E tests for the delays while the real reason is the lack of Investment by the software companies on Testing. Imagine a Car company deciding to do "Contract testing" and fire their quality department - that's what the software companies are doing nowadays and they are doing it because they want to earn more profits - and your "Over Engineered" solutions are not easy to understand and there is no real example to show that these things work.
Thanks for the video. Does your learnings apply to Selenium testers who automate end2end scenarios. Do they need to work with devs to mock the integrating applications that form the full system?
Nah. having some really bad dependent that may all of the sudden change in how their work. and at least checking the happy path in e2e shows to be useful. For the other things you can also have local component level tests that run on CI build. Just fake adapters to behave in whatever way u prefer and test the rest of the system.
What frustrates me endlessly is that i have to explain this again and again and again in each project i join while it seems so obvious to me. This approach works as good as it gets, if your code works against this isolated setup, it 99% of the time also works in production (after the initial setup of the interface) and you can fully test it with a behaviour driven testing approach, which also provides you a "documentation" of how your system should work as a black-box (or grey-box). It's faster, it's more reliable, it's cheaper, Frameworks like Spring and Quarkus provide all the tools, it can be executed locally... yet still people try to script giant test suits over dozens of microservices and run them all on every commit and don't invest anything in isolated TDD/BDD tests. Frustrating. Consumer driven contract tests are a nice addition, but they come with a lot of pain because the PACT framework is horrible un-intuitive and many developers have a hard time to understand the concepts.
Imho this video makes a good point about why E2E is treacherous water and leads to a slow feedback strategy, then it backpedals by introducing multiple concepts where you should use E2E, and does it in a way that only programmers (and not stakeholders) might understand. That's very high level criticism. I saw very few experts that ever made this argument within 20 minutes. But, imho, we still need a better language to adress this paradigm. I tried to convince my project leaders about 10 years ago with a similar train of thought, and I lost them at "when we do it right, we don't need E2E anymore." Because they - rightfully - argued that we will probably never make it right.
E2E is not complicated - It is what the real customer or the end user is going to do when they use the product. E2E is projected as complicated by vested interests -aka companies that don't want to spend money on testing and lazy developers who don't want to take accountability for their bugs. I have seen enough of crappy apps and websites in the last 10 years where even the basic scenarios are failing but I'm sure the development teams had "contract tests" like the ones mentioned in the video that are passing and the pipeline was "Green". In such scenarios, I have seen the development teams don't even want to take accountability for the Issues and simply pass on the blame because they are not responsible for the full system. Over Engineering is killing Software quality and the snake oil salesmen are selling "don't spend money on testing" ideas to the companies that can ultimately lead to loss of business.
I think of E2E testing as "tests for tests". Integration tests are tests for unit tests, E2E tests are tests for integration and unit tests (and contract tests and load tests, if in prod, don't @ me). Users are the ultimate tests for E2E tests. Each of these stages gets less granular and more prone to mis-specification or lack of interpretability, but each should inform the layer below when a problem is detected
I understand we don't need to do a full coverage with end-to-end testing and it's sometimes logically impossible. But it has to be there even as a smoke test. Because "stubbing" and "mocking" is sometimes an oversimplification and its inputs are based on our own imagination. We still need at least several test cases with end-to-end just to make sure we didn't miss anything.
I'm concerned about the simplification of having only one input and one output or just one upstream and one downstream systems. I know many examples of two-way systems. While system A may be the upstream for the request, it may be the downstream of the response. Also, a system may depend on many other systems and many more may depend on the first one. Finally, there's the state. Some systems may be stateful by design. Well, you always can re-design a stateful system as a stateless one by extracting state into another system, but then you should test that new stateful system as well, right? I'd really like to see how does contract testing work with multiple inputs, outputs, preconditions and postconditions.
There are so many definitions of E2E out there in the wild. For some it's simply talking to a GUI or API through some automated ways. For some it's a test across the stack for a single application, there is the application to application kind of test but it could even mean an infrastructure test. Heck even a single ping from a client to a server could qualify as an E2E scenario. I'm glad to see the rest of the world is catching up with simply being aware of one sided contract changes occurring without notice. If the contract is being defined good enough simply tracking the meta changes on the contract should be enough of an indication of things breaking somewhere on the data river across the corporate systems landscape. Sadly more often the contracts are loosely defined, so open on purpose as it is hard to make a proper contract. And thus things break on garbage being pushed down the funnel or changes made of what means what when and thus causing trouble for the handling of actual work in progress.
Ignoring any 'change' done by a developer you need to regularly 'daily' run automated regression end to end tests - why ? Uncontrolled Change within the environments by third parties. 1) Your users computers are being continually 'patched' for security and functional changes. (remember patch Tuesday!) 2) Your production systems are being continually 'patched' for security enhancements and network reconfigurations. 3) The browsers that your web applications run on are continually evolving functionally and updates for security (and data gathering!). Running your tests pick up this 'uncontrolled' but deliberate change to functionality and security in the environment. Our regression tests have picked up these kind of changes where our software (although unchanged) have been effected by the environment that the system runs on and the parts that run on the customers computers. This allows the developers to make fixes due to the changing environment. So if we need to run end to end just to keep up with uncontrolled change - we must certainly run end to end when developers make change.
I usually call them acceptance tests and the way I test them is by mocking external services with mocking tools like Wiremock and dockerized infrastructure in order to test the contracts as is mentioned in this video
What about set of microservices owned by one team, communicating through both async messages as well as standard http request-response? Would you do only contract testing described here, treating these microservices as an external system in fact, or would you write full e2e tests, if you control the whole environment where these tests are run ?
It actually works fairly well if you are very diligent about IaC and test data generation, but if you aren't at near 100% automation, you're gonna have a bad time. In fact, our issue is it's so easy people spin up entire environments to test simple things, which is expensive unless everything is serverless. I think you are way underselling the effort to mock N apis for 'isolated' testing as well as ignoring what it DOESN'T test. You really want to test the 'glue' as well as the 'components' which mocking doesn't really achieve. When you have IaC mastered you can do things like bring up an environment that is dedicated to doing some long running 'nightly' job that requires several async components chatting through multiple pieces of infra (rest calls, MQs, api gateway, etc). It's hard to be overly confident mocking something like that.
Great response. Mocks are a really flaky concept and the entire take on contract testing is not a fair comparison to E2E. At the end of the day, the user is going to do something and expect something to happen. E2E makes sure the right stuff happens. Unit/Integration tests with mocks might be useful for maintaining independent functional pieces but how do you know if your entire system is working without tying the real production pieces together? How do you know your mocks are accurately simulating the real external service? Automated tests are a big burden to write and maintain.
What I describe isn't really mocking, as most people mean it. It is system-level mocking through the same interfaces that external components, sub-systems, and external collaborators use. So we are testing the 'glue'.
Well we were in production on a large system with thousands of users for over 13 months before the first defect was noticed by a user, so it's not all that 'flaky'.
@@ContinuousDelivery I don't see how that's possible. If you have 2 components that are coupled via an MQ, mocking both sides isn't enough. You also have to verify all the configs/permissions etc are in place as well. Further, how do you test that those permissions have the desired effect on the system w/o an environment to do so?
Wow... So i switched jobs recently (Jan 21). I used to work as a software engineer, prior to my current job. Now i have to "manage" a project that basically shows all the flaws you mention here. The software is delivered by an external company. It is so frustrating that even the small things do not work... especially if you know how TDD works and how beneficial it can be. Yesterday we got a software version and a very basic functionality (logging a string at the beginning of a function) failed / led to an application crash! because of a typo in the log message. We are all humans, and error can happen. but that this version gets shipped is horrible. This project lasts almost the whole time that i am at the company. For two reasons: we deploy it on more machines with minor features added, we fix bugs. And yes. bugs like i mentioned before. I spent my Christmas holiday on implementing the software on my own, completely TDD and i was done in a week. sure it was never tested, but the main features were implemented and tested. technical test coverage 100%. Management does not really see to replace the external company... and i am already thinking about looking for a new job... since this is sooo frustrating. I know the question might sound dumb: are there any ideas how to reorganize this? @Kevin Dietz: You see you are not alone ;-)
@@Martinit0 It would definitely be an option. But i know how our company picks suppliers... i would not be one of them... especially if i quit the company and try to work on my own... super emotional picking process.
My optimal E2E test consists only of Docker container. So that you can run it locally, inside a CI/CD pipeline, maybe even in production in the same configuration. The image versions allow you to finely control the systems that are otherwise out of your control.
I would also argue NOT to do unit testing. Because most of the time we're not testing algorithms and equations. We're testing if a function call other two functions. No real meaning behind the test case. So what the average developer does is just set the expected to the actual because "he wrote the code and knows that what it's supposed to do". And now if there's a bug, fixing the bug would fail the test case. The test case would lock the bug inside.
I do see some logic in the argument that if you can’t be trusted to test your interfaces how can you end to end test… Where I do see some things to object about; teams that cross check their software against standards simulators / checkers / verifiers still get pieces of software that don’t fit together. Team A and Team B both get a green light that they correctly implemented the standard and their interfaces are OK. Once they eventually are tested together you learn that their compatible implementation still contain small interpretation differences and are not working together. On my shopping bag for “good enough testing” I would like software to be deployable in real world with a good egress filter (stupid “I cannot install because I’m not in R&D network” errors) and an automated integration suite that at least show red when the pieces doesn’t fit together. For some software “don’t do big pieces of muddy software” isn’t a good enough answer. Since the different pieces are standardized in big standards with lots of room for subtle interpretation difference :)
Well it depends on the degree of coupling, and if the coupling is high between these parts then the answer is to increase the scope of the system, with shared code between teams, and the scope of your deployment pipeline to match. Now you all the tests (E2E for your SW) for all the parts of the system together, and release all the pieces together. Define the boundaries by what makes a sensible "releasable unit of software".
If "our system" (system B) was a service that consumed messages, did some work, and published new messages, would you consider system A the consuming, system B "doing the work" and system C producing the messages? Would you mock A and C, tesitng the logic of B given that, or would A and C not being different systems in my example, and instead of considered part of system B too?
No right answer here, it depends, but deciding on what the boundaries of your system are is important. I think that the key indicator, is "what is the scope of deployment?" If you deploy all these things together, they are part of a single system and should be evaluated together, if parts are deployed separately, they are not and are better treated as independent systems, maybe protected by contract tests at the points where they communicate.
There's so much truth in this, but the solution evades me. The complexity can be _immense_. In my experience with e2e testing, the biggest issue I have seen, is when the _data_ changes - and this data is often _real_ data, not mocked. An example: A frontend application which makes a request to an endpoint to get data. It could be a request based on search parameters, as an example. The issue is that the data is in constant flux - it's being modified, via other software. A product availability may change, or the name of a product changes - it doesn't matter exactly _what_ that change is, other than it is _change_. With a great deal of frontend end-to-end testing, you could be using something like selenium. You want to assert, that when a search is done with specific parameters, you expect some data to be there. It's so fragile - and if you are implementing smoke tests as part of your CI/CD solution, the build fails when an assert fails - and that could be because that data isn't there _or_ it has changed. The title is no longer "ABC", it is now "XYZ" The data should _always_ be considered to be in flux. It should be expected it _can_ change. Yet this type of end-to-end testing doesn't cover that - it blindly expects, that when you do a search for a product with a unique identifier, you will _always_ get the same data back. So, to get around this, I guess you could employ techniques where the tests connect to _mock data_ - but what if the underlying systems that provide the real data have changed, but your mock data isn't updated to recognise this? - an additional data point has been added or it's type has changed - or any other manner of change has happened. That means you have to find a way to automate the update of mock data. It's a horror show. The "solution" often ends up being half-baked - you don't check for _specific_ data, just the placeholder where that data would appear - the _markup_ that surrounds it. In the case of HTML, you assert that, for example, a heading HTML tag actually has something in it - doesn't matter what, there's just _something_ there. That's fine, until the entire request fails because the product ID your tests are using is no longer there - and your software has handled this with a 404 or some other graceful fail. The end-to-end tests _don't_ know this. They timeout. The smoke tests fail, nothing gets deployed. Crazy. Horror show.
What if in a distributed system a certain workflow spreads across multiple services/applications? You still need a way to say if all services together do the right thing in ensemble
If you need to test them all together before release, then they are a "single deployable unit" and so should be version controlled, built, tested and deployed together, and if you are aiming to do a good job, you need an answer to the question "is all this stuff releasable" at least once per day - that's what Continuous Delivery is.
These are great ideas, I wish I could figure out how to implement them with the stuff my team works on. We currently put a huge amount of energy into making full test builds of the product, which our work is only a small component of, installing it on multiple platforms, and doing live testing.
Try testing to the edges of your system, deploy it as if it was running in prod, but fake everything around it. Do this first for a single, simple, but real case, and in parallel with your existing testing strategy, do it as an experiment. The degree to which it is hard to fake at the edges of your system tells you a lot about your system's interfaces. Next try to make it easier to fake the interactions by improving the abstraction at the edges of your system. Build little defensive walls between your system and external systems if necessary (Ports & Adaptors) at some point you will see light at the end of the tunnel, because things will get easier. Now you have software that's easier to test AND software that is better designed.
@@semosancus5506 It's not that it's a monolith, it's more that the way my team's product interacts with others is highly complex, and there are numerous organizational dysfunctions, such that even though we have thorough and robust automated unit and regression tests on the actual data and logic of our product, every time something unexpected happens, even if it was because of some other piece of the overall architecture, we internalize the need to keep our asses covered at all times. So we continue to keep this team of testers - who would all make great developers IMO - and spend 75% of our capacity each sprint on setting up a dozen test environments and running live tests.
@@transentient complex interactions usually indicate that there is a mismatch between the problem domain and the organization (not quite, but kind of Conway's law). That could also be the source of "ass covering" when problems go wrong. With complex interactions, it allows for teams to hide behind the complexity and run misdirection campaigns. I would look at stepping back, assessing your problem domain and organize around the seams you can discern in that domain. Then give 100% ownership of a domain to a team and take away the ability to hide behind complexity. I have my teams work really hard to define a clear domain boundary and codify it into API with Contracts. Overall teams love it and are much more productive and happy. I can't tell you often I hear "We screwed up" instead of "not our fault". To tie this back to this video and your original comment, If you can possibly do some org structure changes, then you will find the need to spend big on setting up E2E tests will drop. If you need to sell all this to your senior leadership, I would do some ROI computations on spending 75% of your sprint cap on this effort. If the ROI sucks, then you could propose your org try to carve out one well defined bounded context in your problem domain and solidify the API with Contracts. Then measure the number of bugs in these interactions vs. others.
I really like the ideas exposed here and I have thought about them myself, but I really would like to see some references or documentation where the idea is expanded or some other people discuss it; that's because I would like to share some of these ideas to my teamwork, but just saying stuff like: "ey watch, I really like these video ideas" and then sending a UA-cam link just doesn't sound very serious or professional for me
“the code whisperer” may be opinionated but the articles contain references to other materials. “Some programmers insist on painting a wall from three metres back. I understand why they might prefer it (it seems easier), but I’ll never understand why they consider it more effective than picking up a brush” J. B. Rainsberger
what's the point of testing if my software works under situations I control? I want to know if my software will work out there on the wild, especially its critical business features. Of course testing that deals with coupling problems: my system *IS* coupled to other systems. We should't stop using a tool because it deals with problems it doesn't cause; only it its cons doesn't overcome its pros.
Durability is something I found too many middy-engineers didn't seem to respect at AWS. Too many placed far too great a value in having a large number of tests, and getting there quickly by flipping/adding insignificant variables and verifying the most minor of details didn't change, without thinking of what or why, rather than adding quality tests. So much time wasted fixing tests that tested something they never should have been generating noise and wasting effort, which ultimately makes the system less reliable as it becomes a practice for engineers to save energy and effort by ignoring some failures or not really taking a closer look at them.
hi dave. loved your insights as always. my question to you is if we are doing the sorts of "end" 2 "end" testing so we still need to write large unit tests or are these sort of tests are enough to give us enough confidence to engage ourselves in the contunuoius delivery paradigm. its also good to note the amount of resources that we can spare and the outcome to drive given the time constraint on a project in very small teams as opposed to something you might be used to at Getco or ThoughtWorks 🤔
What about when infrastructure is automated? We can deploy a new throw-away cluster in minutes thanks to Infra-as-code (terraform + argocd or pulumi). Our pipelines can deploy any version we want (any service or all) also in minutes. There is no human intervention. This can even be used for load-tests as a stage in a pipeline if needed or the end-to-end tests.
Obviously automating infrastructure is a good thing, but it doesn't help you to control the variables in terms of the systems that are outside your direct control. Just as a thought experiment imagine testing the internet vs testing just your SW. There is a difference! The bigger the system, the less you have control in your tests. This is a fact, not opinion. "Bigger and more complex" is always harder to test.
I stop thinking of e2e testing now because of the video. Not because of it's not important, instead, as a single service for sure it always interacts with the other services, while we should focus on different combination of input and our expected output on the SUT, relative to request-response behavior testing and contract testing. That would be much enough to make ourselves confident when see a green light passed all pre-defined test cases on the pipeline.
As soon as you start using a test rig or stub/mock implementation of external systems that your SUT is interacting with in an E2E test, you practically are rewriting unit tests and is a waste of time.
Do you assume that E2E testing is the only testing ever done? I mean what happens to the use of unit tests, smoke tests, and user acceptance tests? I have never heard that E2E is a catch all for all testing, so if companies a doing this, then they have missed some very significant parts of the why and what testing is performed. I agree with everything you said, except for that part of E2E being the only testing done.
E2E testing is often, wrongly, interpreted by orgs new to automated testing as the easy alternative do doing a proper job of test automation. So it being the only form of testing is not a defining characteristic for e2e, but it is a common usage of it.
@@ContinuousDelivery So then that is a yes, you actually believe that E2E is used as the only testing and further you believe they use E2E as full coverage testing. I agree that is wrong if they are doing it that way. I am just skeptical that this is how testing is being performed with only the use of E2E. I am sure that you can find some companies some place using it like that, but I do not think it is the norm.
@@kennethgee2004I don't think I said it was "the norm" but it is fairly common in big firms. It usually takes the form of them deferring all testing to a QA team. The QA team are now responsible for gatekeeping releases, and are overwhelmed. They read about automated testing, but are in the wrong place to do a really good job, because you need to build testing into the process of development itself, not try to add it as an afterthought. So they attempt to automate, or at least supplement manual release testing with automation, and you end up with over-complex, overly fragile, e2e tests as the on automated testing being done. Not everywhere, but an extremely common anti-pattern.
@@ContinuousDelivery Possible. I still do not see that a QA testing team is inherently leading to that type of testing. They can just be doing the other testing on a different team. The whole point to QA is to ensure quality. Is that not what a unit test performs? The quality of the tests is not determined by structure or paradigm, but how the testing is performed and on what they test.
@@kennethgee2004 You keep putting words in my mouth. I didn't say "inherently leading to..." I said it is a common anti-pattern, those are not the same things. I think that QA is often seen in the role of "quality gatekeepers" and I think that is a big mistake. To quote Demming, "You don't inspect quality into a product, you build it in". For software that means that "testing after" is the wrong answer, whoever does it. You don't get to real quality that way. TDD is a LOT more than a QA processes. It changes how we organise dev and how we design code, for the better.
“Manual testing” seems to be “actually trying the damned thing”. What’s really interesting is how many people building (and testing) a product seem to be annoyed at the idea of actually trying it.
I agree, trying it is a really good idea, not always possible, but great if you can. But it is not the same as testing, and not even close to being "enough" for testing most systems.
@@ContinuousDelivery The false and unhelpful idea that testing can be automated prompts the division of testing into “manual testing” and “automated testing”. Listen: no other aspect of software development (or indeed of any human social, cognitive, intellectual, critical, analytical, or investigative work) is divided that way. There are no “manual programmers”. There is no “automated research”. Managers don’t manage projects manually, and there is no “automated management”. Doctors may use very powerful and sophisticated tools, but there are no “automated doctors”, nor are there “manual doctors”, and no doctor would accept for one minute being categorized that way.
@@ContinuousDelivery Testing cannot be automated. Period. Certain tasks within and around testing can benefit a lot from tools, but having machinery punch virtual keys and compare product output to specificed output is not more “automated testing” than spell-checking is “automated editing”. Enough of all that, please.
@@ContinuousDelivery Also, I would love it if you did a video about testing with Michael Bolton from developsense, that would be an epic watch/debate between two very smart guys, still I'll dream on!
Is software testing & the practitioners thereof, really so far behind the curve that these kind of courses are actually needed?? IMO, all this particular material achieves is to cover off a test pattern (integration in its many & varied guises) that is well understood & have been widely employed for many decades, outside the software world - yes, there really is one ... in which we all live. That being said, it is encouraging to know that I'm not the only one holding & espousing, indeed evangelizing, this PoV.
Tests don't make quality software, developers do. Developers that have enough time to sit and think. The problem with unit testing for example is your jump right into coding. Most of the time not coding anything but thinking for a while first is much better, to see the edge cases by thinking
No, it's not the same. Acceptance Tests test that your SW is releasable, that is it can be deployed into production without any more checking. Component tests don't usually say "it's ready to release", they may increase your confidence, but probably will require more work later.
Learn more about Acceptance Testing and Behaviour Driven Development with my on-line course "ATTD & BDD: From Stories to Executable Specifications". Find out more about my collection of Acceptance Testing courses HERE ➡ courses.cd.training/pages/about-atdd
Oh, this is bound to be a non-controversial take.
Got my click
The common in-practice use of “end to end” testing is quite different today than manual testing in a staging environment. It more often than not includes automated testing rather than manual testing these days.
It can still be brittle and slow but you do get quite a bit more coverage if you can stomach the wait times.
Conclusion is to do both E2E tests and tests in isolation. Not to exclude E2E testing; Many teams already do both.
Exactly. The TLDR is "use mocks sometimes" 🤷♂️
Conclusion: the man does not know anything about testing as he doesn't know about software engineering in general. See the last comment I posted. I really should open a channel to debunk all this bullshit.
@@biomorphic yeah. Lots of people who speak a lot about software development tend to want to think they have useful things to teach more than actually knowing what they're talking about. I watch stuff like this anyway for the occasion where they have something correct to say that I didn't already know. More often, I just learn more about the latest buzz words in software development.
@@biomorphicDo it. Put your money where your mouth is.
@@ThaitopYT I am considering. Maybe in a couple of months when I will have more time I will start a channel.
What I usually mean when I say E2E is a Puppeteer, Cypress, Selenium, etc type test that does things like "When Button A is Clicked by a User with Conditions X then API M is called with the correct parameters and the results will reflect the conditions." And similar things like "When API M responds with a error B we give the user a popup that says sorry the service is down." Stuff like that. For distributed systems you really need tracing of some kind plus some amount of unit testing or formal verification on the individual components.
Well he makes exactly case against that with the contract testing.
Even for frontend test, you can have fake mock up of your backend or backends, and just test against that.
That would form the frontend contract with backend.
Then in backend you test the contract matches the reality.
@@lukasdolezal8245 I mean yes in general you use a mocked backend. You spin up a mock backend in a docker container or otherwise in a virtual environment somewhere and then run against that. If you have to run an E2E like that in a live environment for every environment even for just 2 environments that's going to be time consuming and brittle.
@@GoodVolition do you mind expanding on what your definition of "environment" is here?
@@lukasdolezal8245 There are several fields that would call insanity to not do any end to end test. Would you deploy a telephony system without doing a single real call before delivery? It is not only deemed stupid, regulatory agencies demand register and proofs of a large set of REAL conditions tests . You may test every single part of your system and each signal degardation step may be within parameters but when you put everythign together the interaction along those degradations creates an unusable result. So it is common practive to automatically generate a large number of calls on a large number of routing situations and compare the entropy between the signals in the input and in the output.
@@GoodVolition I think instances like this are better solved by stubbing your backend interface. It's faster to run, less work and your devs don't need to run another service to run the tests. It also reduces the headaches of having to mock authentication.
I know a place for which E2E testing works relatively well. It works relatively well as a temporary testing suite made for a legacy monster of a system before approaching to refactor that and make it more testable and add unit tests. It gets terrible when managers see that we have test coverage now, slap themself a high five, and stop not giving time and budget to refactorize that legacy codebase.
That is specially true because in a legacy system you not always knwo what is expected at each point or component output, but you know with a larger degree of certainty what is expected at the end of a LARGER group of components or system.
What a breath of fresh air.
Good thing I found your channel. I was shocked how correct my professors at college were when they told me how often companies don't follow good practices, and often end up with a jumbled up system. They killed my passion, but I am in the middle of changing jobs. I hope the new place is better.
Thanks! Good luck in your new job.
spoiler alert: it isn't, its never better ahahahah
No system is perfect @12q8. In fact there would be no fun working in a perfect system.
I worked at a major bank, end to end testing sucks.
Account attributes , api behavior, etc...need to be considered...pain.
I like a lot of the ideas in this channel. However, I think maybe a lot of them can only be achieved in rare perfect conditions where you have for example a.) Complete control of development related activities in an organisation b.) A team of top notch developers who all generally get on well with each other c.) People focused more on what's best for the team rather than their own career. In the real world though things are a big mess and you might have better luck herding cats.
I'm looking forward to your practical tips for how to better navigate our imperfect world; you clearly see aspects that can be better handled some other way, and the way we should be handling them.
choosing where to work based on the alignment with your values is an important part of looking for any job. Even if it doesn't fully align, you have every opportunity to present your case to other developers to win them over - I spend a lot of time trying to win hearts and minds, especially with some of these 'radical' ideas. At the end of the day your results will speak for themselves and you might get more buy in than you think. It's not always a lost cause, some people just don't know anything else
@@k0zakini0 oh, great idea, pick another job until you find a place where you can work by the book. Too bad most of us cannot do the same though.
Tim, interesting that you focused on the social side. Our issue is more that we have very few elements where the interfaces can be cleanly separated. Just a few and those are not the areas where we have problems.
I would have liked more information on the cost to build and maintain Dave's "test rig". By the looks of it, he's build entire fakes, but the trouble (aka, effort) with fakes is that they need their own unit tests to ensure they behave correctly. I do e2e testing on my REST API's before I deploy, but the effort of keeping my test rig data maintained is driving me nuts.
In my experience, whenever I rely on mocking systems, the tests all pass, they are durable, everything is great... and yet in the real production system, things are breaking. The tests just adapt to the mocks over time, and so we aren't testing the real thing. Then, we have to increase complexity of the mocks, until... they are approaching the complexity of the app itself! It seems hopeless, so I just do e2e and do my best to still isolate things. Tough, very hard, automated testing is harder than building almost anything else. Still, good point of view and glad to hear it, thanks!
The commonest mistakes that I see in this is not abstracting enough, or appropriately at these boundaries that you mock. How can the mocks be wrong, different to production, if you define a contract and mock the contract, and then production behaviour differs, then by definition, your contract isn't good enough. Practically the mistake that people often make is mock at technical boundaries, even mocking 3rd party interfaces, I think that is a mistake, because those APIs are big complex and general. Abstract the conversation with your own layer of code, so that it represents the simplest version of the conversation between these pieces, don't allow any routes of conversation outside of these interfaces, now you can mock those conversations with more confidence.
1000% -- It's refreshing to see someone discuss the things that have been on my mind for quite a while, and it's very difficult to get some people in organizations to let go of the false sense of security that large complex e2e or integration tests provide. They call it "quality", while it's anything but quality. What's more frustrating is that resistance is often combined with arrogance and unwillingness to learn.
Actually I'd venture to say it's the opposite of quality, because you are putting confidence in a methodology that has simply proven to be inadequate for testing large complicated systems -- especially today where each component may have connections and talk to multiple others, and so on.
Also, your videos have the best animations!
It’s sad to be in this situation, people have difficulty to focus on problems that real exists, they prefer to blindly follow standards like test pyramid. So they follow standards and have a false sense of confidence instead of focus on the problem and think on what is the simplest thing that could be done to prevent the problem to happen.
Excellent topic but could not disagree any more completely as regards to NOT doing E2E Testing!!
My main point of disagreement is your support of an alphabet soup of automated tests (Unit Test, Integration Tests, TDD, BDD, etc...) that are run by dev/qa testers, is all you need with fake/mock end points as required in order to do end to end testing.
I have been developing an algorithmic options trading system for the past 5 years now and the main end to end testing I do is always from a complete business workflow point of view as submitted by the end user which I do AFTER I have completed all of those automated tests that you mentioned of which I am in complete agreement with. A single end to end test for my algo trading systems is to: create all required reference data, open a trade , monitor the trade in the app with real time market data and close the trade exactly like I would do in a production system. No faking of any api's with mocks but definitely use third party testing resources as required with the only difference being configuration requirement changes as compared to production. Any data that is like production should be clearly defined as test data and this needs to be designed in from the get go.
In this way, end to end testing should be very simple and business user workflow oriented wise, completely exhaustive to verify at a minimum that the applications meets the business requirements from developer point of view. Once this is met, then you can move onto business User Acceptance testing and production Pilot testing with clear idea of successfully delivering a product that was clearly defined by the business.
At least for me, E2E testing should verify your business transactions in real time and how the system manages any error situations that could arise in production.
I think the key thing is just test at many levels, and test the right things at each level. Basically the testing pyramid.
Unit tests have a place, integration tests have a place, contract tests have a place and even end to end tests have a place, but don't use end to end tests to test things you can test at a lower level.
For example, your end to end test might just be a simple happy path that you can use during canary or blue/green, which can do things that contract tests can't, like verify the infrastructure and services together.
This, your answers puts it much better . There are things you NEED E2E tests, so do it for those. Just try to use smaller scale for things that can be caught on that scale. A simple example is system performance, e2E testign is the only one that will give you a REALISTIC measure of the performance on a system that is asynchronous, parallel and with several dozen stages.
It depends where you think the "end" is. If I am testing the performance of a financial exchange, I don't need to also test the performance of clearing and user-registration. So those external systems can be faked for most of my performance testing.
@@ContinuousDelivery The further you move "end" to the users interface, the better you find actual problems. Your financial exchange might take 2 seconds, but shown in limbo for minutes due to aggressive caching in the frontend.
100% it's a shame that Dave chose to use the inflammatory "don't do end2end testing" rather than the more advisable - test appropriately at each level of integration.
@@Ash31415 Sincirely I think that is the problem with some of his videos. The title of a book gives the whole tone of the reading experience.
Not what I expected to hear, I missed the fact that E2E tests are very valuable for monitoring.
You don't want your users to detect that the checkout process is broken, you should detect that early with E2E tests. Run them continuously on all environments, even production, to validate business critical paths work.
Of course E2E test fail from time to time. That's expected. Software updates on the users side or API changes of external systems happen all the time. It's our job to build a software that is resilient and keeps working. And if it doesn't, we should be informed swiftly to look into it.
In the end, neither your unit or contract tests matter if you don't recognize that your app is broken for end-users due to a browser update or a tiny change in any external system you don't control.
You should be able to trust tests lower in the pyramid. If you can't, this needs addressed.
@@-Jason-L Absolutely, but what's suggested here is to ignore the top of the pyramid
@@-Jason-L Should, but you cannot! If you are making a website that sells tshirts you can risk it, but if you are making systems where a single minute of downtime can cost millions or worse, LIVES, then you CANNOT TRUST ANYONE word.
You really are supposed to have the least amount of tests in E2E, instead many developers argue to put everything in E2E because the code sucks and "UTC is hard." It's very frustrating.
If you actually have good code and good unit tests, E2E replaces simple happy-case sanity on app, it's just sanity.
It's a lot of work to get E2E right (don't mock the backend, that makes no sense), would suggest to literally take a hour or two and do that sanity manually.
Manual sanity is needed anyway. Most companies can't coordinate between conflicting features, the AC is correct for everyone but the user.
Seriously, E2E is not the answer.
@@johnjackson6262 It is not the answer, but NOTHING is the answer anywhere in software development. Everything are tools and tools are to be used when they are deemed the correct tool for that specific moment. The simplest example is ensuring the performance of the system is still within acceptable parameters. You can only be sure about the performance in an asynchronous parallel system when you run the system in a complete state.
These are wonderful ideas but every organization I've ever worked for was way too dysfunctional to ever come close to a methodical testing.
It requires educating management about the engineering realities.
it's your responsibility to make the organisation less dysfunctional (you are also the organisation). You are doing the work, you know more than the org on how to best to test your code. Developers have more power than you think to drive change, especially in dev practices.. start using it
@@k0zakini0 In general I agree. I have tried to do that to the extent possible. Usually I find myself as a newcomer in an organization that is already large and mature. It's much harder to move the needle in situations like that. Oh how I would love to lead an organization to the Promised Land, but sadly, that just never seems to be in the cards. That's why I work on my own startup.
@@kdietz65 see my comment on the root of this. Your approach sounds appealing!!
@@k0zakini0 they are still employees they don't really have a say especially in things which could add short term costs to production. This is the reality of our economic system.
I recently started implementing Cypress into my teams app out of my own volition, and luckily this is an endeavor the company would like to move forward with. Doing it really showed me the value in system tests where we can go over every single circumstance, whereas our e2e tests only view the very limited happy path. E2E really started looking completely useless to me outside of accessibility testing and the most basic of pipeline checks for PRs.
Dave, just wanted to note that the thumbnails are starting to be touching cringe. As much as I understand the "nature of YT and click-baits" but for god sake, there is a limit. We get here because of your competence and level of delivery - not because of weird click-baity cringe thumbnail showing some exaggerated and fake reactions/emotions.
Unfortunately the slightly click-baity titles make a huge difference to the reach of the videos. I would like it if we could do more straight-forward titles and thumbnails, but if I want my videos to be watched, and I do, then that is simply not how UA-cam works these days.
@@ContinuousDelivery Thanks for the reply. Turns out I will need to close my eyes next time and make a click. The chasm between the style of videos (which are professional and to the point from the first second) and thumbnails is so large my brain is getting fried. The only thing left is to cry about the future of humanity as species if we need to do things like that to educate people.
We replace the word « system » by « object » and it becomes the definition of a unit test. Unit tests, integration tests, acceptance tests and e2e tests all look the same. A controlled input to assess expected output. The only differences I see are the relative scale and the tools used to control inputs and assess outputs. What am I missing?
We can easily imagine acceptance tests of a library/package barely being an integration test of another more complex system. The same way we could imagine unit testing a group of external systems from a system substantially bigger. Like if someone decided to test the internet. Maybe there is a system where we unit test the internet
The "only" difference is relative scale. But that is on every metric. It increases chances of false positives and false negatives, it increases the burden when setting up your controlled input, and it increases the burden when filtering through your output.
You may as well say that a unit test is the same as measuring the electrical signals coming across your ethernet cable - the only difference is the scale of the data you have to send and sift through, and there's lots of inconsequential changes that could cause a test (or heck, all of them) to fail.
Or taken to the extreme, the input and output is every atom in the universe, we simply have to control the state of the whole universe before running our experiment then measure the state of the whole universe after it is done.
Exactly and how do you test something that has inconsequential changes that could cause a test to fail? How has the world not collapsed already?
I think this depends a lot on the domain. My background is in air traffic management systems where it is essential to test in isolation using simulators, which are themselves complex systems requiring their own testing (but can double up as a training tool for end-users). This is in no way a substitute for E2E testing though, which can, and probably will, reveal issues that escaped all prior testing. The importance of this level of testing is such that a new system will often shadow a pre-existing system for some time before becoming operational. This is obviously the most expensive time to discover an issue, so if there is any way of incorporating some degree of E2E testing earlier in the project it could be very valuable.
Well in reality that end-to-end test is going to happen one way or another. You just have to decide whether you want to do it before the customer does.
Your proposal is ideal but requires every layer to be on board. If you have a system of 15 layers and 4 layers have little to no tests then writing a single all layer e2e test is more cost effective than writing 4 tests and will still prove that the whole thing can be shipped. In this way full e2e are used as a stop gap.
14:40 I think you mean that the API can't be trusted and the solution is to better specify your APIs which I agree with. If you mean the network can't be trusted I don't think there's anything you can do about that.
For software engineering this might be the correct approach.
In data engineering and big data I feel we must do end to end testing because I have to test all of the system under stress
One director I worked for a time ago seemed to be fundamentally against anything that streamlined the development cycle, because he thought if it was too easy then his project would attract lazy developers. I found that mindset quite discouraging.
The kind of neolithic thinking that would have opposed the stick and wheel
Lazy is a good thing. I'm lazy so I want to do things fast and right so I can rest.
I like to explain the fundamental problem of testing in general to teammates with a concept of "feedback time". Aka the time between a mistake is made and the developer learns about it.
And the goal is to constantly minimise this feedback time across the organisation.
Contract tests and fakes at the interfaces are certainly very good tools to achieve smaller feedback time, when it comes to more complex systems.
Yes, exactly, I think that is exactly the right focus.
@@ContinuousDelivery How would I answer the objection from a co-worker:
"Certifying on mocks or integration tests would not be sufficient and that would lead to more bugs surfacing in production. For such a complex product with multiple systems working together, it's best to test as close to prod as possible i.e. prod or an env which is a close replica. Without that, there is going to be very little confidence in what we certify."
I like your scientific reasoning approach to testing. That being that tests are experiments that test, pun intended, our assumptions about the code.
Yet that is not enough. For a client the product is the interaction of the code with the environment. So systems that operate in complex scenarios that cannot be isolated (telecom for example) are usually in need of E2E testing in live scenario to evaluate extraneous factors.
@@tiagodagostini I am pretty sure that was the point of @Victor Martinez - pretty sure it should be read as "That being said, tests are experiments..."
I love this channel , but i completely disagree with what you're saying at 9:35 - "we want to deploy our system" - NO , we want the whole thing to work, if we are on PROD , but the whole scenario is not working and customers cannot use it , what is the point ? Integration is the most challenging part in nowadays, but this way of thinking put on risk the whole idea - we are on prod (system B), we don't care about system A - if they are not on PROD it's not our problem.. Yes, E2E is time consuming and so on, everything that you said - i agree, but the proposed alternative is not working as well..
E2E tests are pretty useful before rewrites / refactors. You scaffold your app with tests, and then you have more confidence going into chanhing parts without breaking things.
Maintaining constantly e2e tests however… usually just yet another useless layer to keep engineers busy…
Nowadays when I think of e2e testing, I think automated.... like Cypress. So far a much more enjoyable TDD experience compared to any web unit testing I've done, so much more efficient than manual testing.
Manual staging environment testing is very expensive and I've only seen it lead to arguments between the testers and devs.
It may seem like a disadvantage that your e2e testing environment is dependent on other teams software that is in flux, but if you don't test the interactions of different systems that are changing somewhere, you'll be testing it in prod. I've recently had another system my own depends on change a response to an api call in qa without ever informing us. Our e2e environment tests caught this, and we discussed it with them and got in synch again. If we'd just faked their responses out, then the issue would have appeared first in production. I'm not sure its as clear cut as this necessarily. In environments with a lot of different teams/services that all interact things will be messy any way you shake it. There can be something to be said for not trusting other teams to communicate perfectly regarding changes to their systems that effect you. I would rather things be messy in testing than have to point fingers and pull up receipts after a live incident.
Watch the last part again. The trick is to utilize contract tests to always be sure your assumptions about the other system are correct.
The whole thing is to be able to run the tests of your own software as fast as you can, while retaining the same confidence (and more) in how it interacts with other systems
I think in an ideal world, most people would prefer to use contract testing & something like WireMock to have faster running tests & to just always trust that collaborators will always work as we expect them to. Unfortunately in a more realistic world, I think E2E tests that exercise real collaborators DO provide value despite their overhead & brittleness. Almost as a form of a smoke test for critical paths before a deployment which is why it also probably makes sense to keep them low in number, so less to maintain. Most workplaces IMO would probably benefit from supplementing their unit tests with both the types of tests described in the video (higher level, black box tests with mocked collaborators) AND the more traditional E2E type to provide higher levels of confidence, versus completely ditching traditional E2E tests. So like most things in software, to do E2E or not to do E2E... it probably depends 🤷♂️
As I say in the video, I think that the best scope for testing is "a releasable unit of SW" and you test that in a deployment pipeline all the time, so there is no need for smoke tests, because you test the whole thing properly, but then, that is the system that you are responsible for - all of it, shared code ownership and all. You can, and should, do this as a collection of small teams. But scope of release and acceptance testing is a releasable unit.
My own experience of owning a lowcode integration platform where we have different APIs (about 50-60 different vendors) we integrate with, suggests that both approaches have validity. Wiremock is a great tool and something we use extensively. This provides confidence that the captured state (which the test uses) can be tested. However, at the same time nothing beats testing with the real API. We've had a few instances where something subtle has changed on the vendor's API AND bang! I honestly doubt that a stubbed test would have caught these instances, whereas a test against the actual service MAY have 🤷♀? The level of testing is of course driven by the value of the API/connector (some have much more use than others) and a feeling....if we think the vendor's API is going to suck/be brittle/change without notice then we try to add a test to attempt to catch that.
@@jimiscott yep, what you're saying heavily mimics one of the problems my most recent place had. Lots of WireMock (great tool but won't - as you said - cover all scenarios) but nothing that catches the BANG scenario you described! Agree that the right answer is to probably run with a mixture of both types of tests as they cover us for different things.
@@ContinuousDelivery "A releasable unit of SW" - Ideally consists of "All" the Systems -> not just System B - You are going against the Agile's feature team concept here - A team is responsible for delivering a value to the end user - Not just a front-end or a back-end. If you are only responsible for the "Front-end", you may use the contract testing - but then the team is not delivering any value to the end-user - they are not delivering any User story - just a bunch of code useful for nothing
@Marck1122 If multiple releasable units only together constitute a new release, they are not multiple releasable units and should be treated as a whole. It is better to separate them up to allow individual progress. A single unit should be releasable independently, and the only thing you can properly test for in that case is the unit itself. Dave here promotes a mix of contract tests to validate interaction with other units, and simulated acceptance tests to actually test all possibilities (impossible with the actual service)
Not sure if the recommendation is to convert e2e into Integration Testing as much as possible?
Maybe better advice is to minimize e2e tests and capture max value through other forms of tests. Not doing e2e tests is bad advise.
I've generally found that mocking system interactions for testing doubles the workload placed on developers. In a perfect world everyone shares their test cases, but even then it's much cheaper to have people write proper documentation and public interfaces - if they can't do the latter, they definitely can't do the former.
Imagine if all consumers wrote tests to submit to all software libraries that they used to inform the publisher about how they were using the library.
I have to say, after listening to this and creating unit tests & integration tests for each system and E2E test to test systems working as expected. I realise that my e2e tests actually test high coupling between the systems.
Always good to listen to these videos, even if I don't agree with some of the points :)
Great video, thank you! From my experience, all projects still are using E2E testing under smoke or regressive suites.
Your description of "system tests" seems more like integration tests to me. The SUT is put in a simulated environment and expected to "integrate" (work with) the connection points presented to it. System testing -- actually testing the whole system -- should test only those aspects of the system that can't be duplicated in a test environment. A full system test basically should be a combination of functional testing (does the whole system work?) and school of hard knocks regression testing (Bob forgot to configure this once, so I guess we have to test for it now). System testing should strive to be minimal and lightweight. Too much system testing results in the bad system you describe: a ponderous final test environment that leads to too much surprise late in the testing cycle and makes releases more difficult than they should be.
I think we need to redefine what end-to-end tests mean.
Specially because in several fields the clients DEMAND proof of end to end testing. When someone says do not do what your clients DEMAND, there is something wrong.
@@tiagodagostini If someone would pay you to run a marathon but damands that you balance a pineapple on your head the whole time, you'd propably tell them to get lost.
@@grrr_lef well if I have no time limit and the pay is really good I might try it :P Also more on the subject.. you cannot tell your clients to get lost. That is not how successfull companies work, besides what holywood might think .
15:55 “for each we faked it when acceptance testing … our stub that represents an external third-party system … from the perspective of our system under test it doesn't know that this is all faked”
Likely my confirmation bias but I'm noticing the absence of any term starting or ending with “mock”.
17:07 “we ran our own contract tests against each end point usually the the endpoints beta test site … we sometimes had contract tests that would fail against their beta site and we'd change our integration with it once we understood the change. We'd do this by first changing our simulation in our tests [i.e. test double]“
The “drifting test doubles problem”. It may seem like a lot of additional (testing) effort but those tests capture “our understanding of how the RealThing™ works” so there is value of having the feedback that it's no longer accurate.
How do we know if the dependent system breaks our contracts if you are always testing against mocks ?
I’ve seen the approach of spinning up all dependent services in the docker network with testcontainers. So you always test against the real versions. If a system B that your system depends on has a new version released, you just bump the version to which you are testing against. That way you functional/acceptance tests in the pipeline will give you faster feedback.
But I’ve also seen that not working if system B depends on C that depends on D… and so on. Then it either becomes a big ball of mud, or you start to mix wiremocks and real versions of services, which I still find reasonable compared to mocking everything.
IMO, mocks should just be placeholders to help write tests.
E2E is actually testing the system while tests using mocks are very close to simulation.
@@mecanuktutorials6476 Agreed, but I’m actually adding that in the context of contract tests, rather than E2E
Contract tests validate services are returning what the clients expect. Integration tests validate that systems are actually calling each other as expected. So you are not always testing only with mocks.
I think the title is incorrect. There is nothing wrong with E2E testing if you automate it, and results are evaluated automatically. In my company we do massive E2E load test, to precipitate any problem much faster than any customer can. The application is subjected to this test after any kind of changes, every time. We fire 1000s of transactions simultaneously, doing every possible operation non-stop for an hour. For major changes, we run for 24 hours non-stop. If nothing breaks, nothing leaks, then it can move to release state.
I'd agree. I think how much E2E testing you can/should do - or even what the term means - depends on many factors. We certainly should not depend on E2E testing only, and no test - no matter what kind - can guarantee 100% correctness, but doing no E2E testing at all? Good luck.
Big ball of mud, exactly what I am dealing with, it is so difficult to get a hold of the state of the application. Every test returns a slightly different result than the one before. Extremely difficult to test.
Basically no organizations achieve the ideal. When we evaluate idealistic recommendations like this one, we have to question: “if we don’t make it all the way, is it still valuable?”
The answer here is unequivocally no. E2E tests are plainly, clearly valuable all the way until you achieve your contract perfection, which for practically everyone is never.
This is not a good recommendation, because it’s all or nothing.
I've developped an internal CI/CD pipeline for a few years, and h'es mistaken: CD is not what "makes sure the software is in the deliverable state", because that's actually the goal of CI.
In contrast, CD takes a version that passed CI and automatically deploys it into production.
All kinds of tests are useful as long you know when and how to apply them. End-to-end testing might be painful but sometimes crucial and unavoidable: for example we can use end-to-end tests when developing "minimum viable prototypes" to test in practice whether the basic assumptions for the grand architecture truly hold so that we can decide early on whether its worth investing further on a system or whether it's doomed to fail. All kinds of tests have their place in the grand scheme of things.
I feel like Dave mainly describes API-based systems where it is all stays true. The UI End to End testing is only fragile because you are using unit-testing approach to write end-to-end tests. @testrigor allows you to write test 100% purely from end-user's perspective with zero reliance on details of implementation. This way some of our customers run tens of thousands of end-to-end flows with 0 flaky issues many times a day.
We are currently on the go to implement automated e2e testing. Namely to test regression of stabilized API's.
I'm already concerned about the set up n clean up needed. When test app will need to need and clean up after each test.
Fine when works, but introduces potential fail state during the test, which can be automatically cleaned up on hang up state, but that's additional complexity.
Potentially our QA engineers might be stressed due to the complexity involved of maintenance and require additional dev time.
However the promise is that regressions could be tested by pressing the button. Namely regressions are not part of the pipeline, they just help testers to run through previous behavior while allowing to test manually the new behavior.
Yeah yeah, manual testing I know, it sucks, but we trying to do the best we can with what we got.
However, Dave here gave me pause for thought.
Not sure if I got it right regarding contract testing between systems.
But.. technically it would be possible that system A exposes it's part of the contract.
And system B fetches the contract input and runs the test with it, testing the output or state change?
Hence each team maintain their own set of contracts and don't care about the rest.
Maybe sounds over complicated too, and what Dave means is just "unit testing" in the scope of deployable unit without outward communication what so ever.
But that scares me given that such tests may give false confidence, as it relies on people communication at ad hoc times. Which seems to break whole determinism promise which will only pop up in prod env.
Anyone care to explain in more detail maybe? What I got wrong?
I agree with you, but only, if we are talking about a perfect world where the company has a very and well-developed culture of testing, QA, and CI/CD runs in a perfect scenario.
For all other realities, E2E testing is a necessity. I really wish that things work so well that we could just drop using E2E tests, I truly agree with your arguments, but most companies are not yet ready.
Neither QA or cicd are required to stop relying on e2e tests. All that is required is proper tests lower in the pyramid.
I like the idea that we can use a range of different types of test, and through all of them we will gain confidence in the system. Basically if you only ever tested each piece of the system on its own, i'd not feel confident releasing that without at least some sort of e2e testing that tests that they all gel together properly.
Still very interesting points made in the video here, I especially like the discussion about how systems a,b,c relate to each other test-wise, i suppose this is where the ideas in mockito might come from
One cool thing when it comes to e2e testing is the idea that if you carry out a handful of business scenarios that together cover all of the pieces in the system, and each one has several tests to go with it, if they all pass then its unlikely there's something seriously wrong, as each piece would've had to do its job successfully when interacting with other pieces in order to produce the right result multiple times. it is always possible that you've got bugs that don't affect the result, or multiple bugs cancelling each other out, but this gets progressively less likely with each test scenario.
you sound like you work in a perfect world where there's tons of time to write and test every aspect of the software. the rest of us who works under the gun, limited resources don't have your luxury to do that.
So its --- Write proper APIs, write proper clients, and proper mocks on how the system acts when sent certain inputs --- then share it with the other teams, so all one needs to do is use them properly while developing system-under-test (what to mock, and what not to mock). Please tell me if I got this right.
I already have been in a scenario where the tests have been skipped as a management decision, while also a lot of pressure has been put on the (mostly inexperienced) devs.
Of course it failed quite spectacular.
One of the things that was done while developing was to at least implement a lot of unit testing. It was crude and partially more creative than really useful, but it provided a quite high coverage around 90%.
One of the worst bugs was a saving process which triggered multiple processes via the frontend, which in the end used the same storage, resulting in them overriding each other's data.
The individual processes all had perfect coverage, but they haven't been tested together, which lead to the bug reaching production with quite terrible consequences.
A very simple e2e would have catched this immediately (and of course we implemented one, including a others).
If you build a proper pipeline it should not be a problem to implement an automated testing stage that at least simulates a user going through the critical paths to make sure that there is no oversight in the combination of some new features.
If anv bug is found which is the result of one of these combinations, you should also add one more of those tests to make sure that this will never happen again.
I'm currently in the same situation: inexperienced devs and management ignoring testing
E2E is not complicated - It is what the real customer or the end user is going to do when they use the product. E2E is projected as complicated by vested interests -aka companies that don't want to spend money on testing and lazy developers who don't want to take accountability for their bugs. I have seen enough of crappy apps and websites in the last 10 years where even the basic scenarios are failing but I'm sure the development teams had "contract tests" like the ones mentioned in the video that are passing and the pipeline was "Green". In such scenarios, I have seen the development teams don't even want to take accountability for the Issues and simply pass on the blame because they are not responsible for the full system. Over Engineering is killing Software quality and the snake oil salesmen are selling "don't spend money on testing" ideas to the companies that can ultimately lead to loss of business.
> One of the worst bugs was a saving process which triggered multiple processes via the frontend, which in the end used the same storage, resulting in them overriding each other's data.
It's a hilarious example of unit tests blindess.
Thanks!
I've tried googling "software testing rig" and I'm surprised how little there is on this topic!
I explain it in some detail in my new training course 😁😉😎
@@ContinuousDelivery Please! Overview of what "software testing rigs" there are. What are the important aspects to look for.
If we're practising continuous deployment we ought to be able to continuously deploy a staging environment.
Interestingly enough, this is a debate we're having at my work now. The problem is, it doesn't matter how tightly you control for you're upstream dependency's when they change on you without letting you know. This happened between 2 of our teams that resulted in a bug that reached production and caused a feedback loop when purchasing under certain circumstances and resulted in customers being charged multiple times for a single product. There's been some back and forth on how much emphasis we put on tests pull several levers to check for regressions.
Lets say the user knows about the full system only and has no knowledge about A, B, C. If technically possible to test the full system, Why would you spend time on these systems individually instead of the full? And why should they do more than they need to?
I understand there are technical issues usually but just theoretically. Maybe A, B, C is just because of legacy or bad design. Maybe it should next year be just A….or A, B, C, D, E, F, G…. Point is, different levels need different approaches and strategies. Too easy to say that E2E should go, but ofc, selfish engineers that ”own” B would love it 👌😅
@@carlbergfeldt818 E2E is typically way more complicated to do than smaller scale testing, and tends to steal resources from more targeted testing. That combination tends to lead to deficiencies that aren't noticed - which pop up later as really difficult to troubleshoot bugs. You have way more visibility at the boundaries of the individual components/systems.
It also tends to lead to undocumented changes in the specifications of the interfaces/boundaries because teams are trying to 'just make it work' rather than pushing back on the team that actually introduced the bug.
@@rich63113 E2E is not complicated - It is what the real customer or the end user is going to do when they use the product. E2E is projected as complicated by vested interests -aka companies that don't want to spend money on testing and lazy developers who don't want to take accountability for their bugs. I have seen enough of crappy apps and websites in the last 10 years where even the basic scenarios are failing but I'm sure the development teams had "contract tests" like the ones mentioned in the video that are passing and the pipeline was "Green". In such scenarios, I have seen the development teams don't even want to take accountability for the Issues and simply pass on the blame because they are not responsible for the full system. Over Engineering is killing Software quality and the snake oil salesmen are selling "don't spend money on testing" ideas to the companies that can ultimately lead to loss of business.
In the test case with invoice rejection, I would add an assetFail() call after the line, which is supposed to throw an exception. This way I make sure test fails if no exception is thrown.
E2E tests find incorrect assumptions between parts of the system. I would agree that you don't want more than a few E2E tests because if you haven't got a grip on interface contracts, E2E tests will be fragile and can become a serious speed bump until the revealed problems are addressed.
I'm not sure how we can implement ATDD in a microservices architecture without essentially conducting end-to-end tests. If our acceptance tests require interactions between all our microservices and the frontend, doesn't that mean we are essentially performing end-to-end testing?
I try to limit "end to end" testing the way you talked about it here to simple smoke tests. somewhat questionable in terms of utility at times, but they do catch if something goes super wrong in deployment. The only huge drawback is if it fails it tends to be more difficult to debug, and it almost always indicates a test gap somewhere more fine-grained.
Yeah I advocate for a small number of customer focused smoke test that feeds well formed bug reports upstream
Ive always dispised the definitions of e2e and integration tests as they dont explain what they are and are so open to interpretation.
I would say it depends.
In a monolithical old style application, it is easy to demonstrate the benefits of an isolated fine grained testing like unit testing.
But let's talk about distributed micro service based architecture.
The micro services themselves (if they are "microservices") are small pieces of code that have a well defined , unique, and very limited business purpose. Testing those e2e could be reasonable.
Now let's take it further and suppose you're using something like Flink for your streaming implementation.
How do you test this kind of application? You cannot test your "streaming logic" outside of flink since it is pretty much implemented in Flink's terms. All the 'filter', 'map', 'keyBy', 'windowing'... you name it.
You have no other choice but to execute a flink application e2e in order to see the results and validate them.
I talked about this issue with Flink's maintainers and they have agreed that Flink's architecture does not allow you to isolate the definition of a stream from the underlying engine. This is a huge drawback of their architecture.
I work in a heavily regulated environment. While we do test every piece independently we also deploy to a test environment that is a clone of production and a group of people has to manually check to make sure everything is correct with a well documented test script to follow. With lives on the line for making a mistake the process is slow to prevent any accidents.
Unfortunately the data say the opposite. By going slower, the result is lower-quality software, not higher. See the State of DevOps reports, and read the Accelerate book. We have a better way to build mission-critical, and safety-critical systems now. That is being adopted in medical devices, military devices, and space-craft as well as many others.
@@ContinuousDelivery Somehow we need to convince the regulators of that. We do test every piece independently with automated test suites and normal CI/CD systems. It is just that at some point a decision is made to release to production and all of that stuff is moved to a test server. We already know all the tests have passed but people need to go through and check many things manually before it can be released to production.
@@Immudzen I have seen that done too in various forms of finance systems and for medical devices. It is my argument that you can't be really compliant without this way of working. I outline some of those ideas in this blog post on "Continuous Compliance": www.davefarley.net/?p=285
@@Immudzen It seems to me that what you describe is an integration test. That is, you did test the "pieces" independently, but that doesn't guarantee successful operation of the entire construct (in case some interpretation differences at interfaces). So to ask for an integration test seems reasonable.
Now of course the interesting question is, did you ever had failed test in the "test server" phase?
@@ContinuousDelivery lol - this is why more and more "Medical devices recalls" are due to Software failures - you guys are playing with people's lives - instead of asking the damn companies to spend more money on testing, you people are selling the snake oil that "less testing" is good etc. Are the developers competent enough to define the "contracts" very well so that there will not be more bugs ? Testing is the last place to catch the Issues before they reach production. But you people are blaming the E2E tests for the delays while the real reason is the lack of Investment by the software companies on Testing. Imagine a Car company deciding to do "Contract testing" and fire their quality department - that's what the software companies are doing nowadays and they are doing it because they want to earn more profits - and your "Over Engineered" solutions are not easy to understand and there is no real example to show that these things work.
Thanks for the video. Does your learnings apply to Selenium testers who automate end2end scenarios. Do they need to work with devs to mock the integrating applications that form the full system?
Nah. having some really bad dependent that may all of the sudden change in how their work. and at least checking the happy path in e2e shows to be useful. For the other things you can also have local component level tests that run on CI build. Just fake adapters to behave in whatever way u prefer and test the rest of the system.
Thanks for this very nice video! I like the calm atmosphere!
What frustrates me endlessly is that i have to explain this again and again and again in each project i join while it seems so obvious to me. This approach works as good as it gets, if your code works against this isolated setup, it 99% of the time also works in production (after the initial setup of the interface) and you can fully test it with a behaviour driven testing approach, which also provides you a "documentation" of how your system should work as a black-box (or grey-box). It's faster, it's more reliable, it's cheaper, Frameworks like Spring and Quarkus provide all the tools, it can be executed locally... yet still people try to script giant test suits over dozens of microservices and run them all on every commit and don't invest anything in isolated TDD/BDD tests. Frustrating.
Consumer driven contract tests are a nice addition, but they come with a lot of pain because the PACT framework is horrible un-intuitive and many developers have a hard time to understand the concepts.
Imho this video makes a good point about why E2E is treacherous water and leads to a slow feedback strategy, then it backpedals by introducing multiple concepts where you should use E2E, and does it in a way that only programmers (and not stakeholders) might understand.
That's very high level criticism. I saw very few experts that ever made this argument within 20 minutes.
But, imho, we still need a better language to adress this paradigm.
I tried to convince my project leaders about 10 years ago with a similar train of thought, and I lost them at "when we do it right, we don't need E2E anymore."
Because they - rightfully - argued that we will probably never make it right.
E2E is not complicated - It is what the real customer or the end user is going to do when they use the product. E2E is projected as complicated by vested interests -aka companies that don't want to spend money on testing and lazy developers who don't want to take accountability for their bugs. I have seen enough of crappy apps and websites in the last 10 years where even the basic scenarios are failing but I'm sure the development teams had "contract tests" like the ones mentioned in the video that are passing and the pipeline was "Green". In such scenarios, I have seen the development teams don't even want to take accountability for the Issues and simply pass on the blame because they are not responsible for the full system. Over Engineering is killing Software quality and the snake oil salesmen are selling "don't spend money on testing" ideas to the companies that can ultimately lead to loss of business.
I think of E2E testing as "tests for tests". Integration tests are tests for unit tests, E2E tests are tests for integration and unit tests (and contract tests and load tests, if in prod, don't @ me). Users are the ultimate tests for E2E tests.
Each of these stages gets less granular and more prone to mis-specification or lack of interpretability, but each should inform the layer below when a problem is detected
I understand we don't need to do a full coverage with end-to-end testing and it's sometimes logically impossible. But it has to be there even as a smoke test. Because "stubbing" and "mocking" is sometimes an oversimplification and its inputs are based on our own imagination. We still need at least several test cases with end-to-end just to make sure we didn't miss anything.
I'm concerned about the simplification of having only one input and one output or just one upstream and one downstream systems.
I know many examples of two-way systems. While system A may be the upstream for the request, it may be the downstream of the response.
Also, a system may depend on many other systems and many more may depend on the first one.
Finally, there's the state. Some systems may be stateful by design. Well, you always can re-design a stateful system as a stateless one by extracting state into another system, but then you should test that new stateful system as well, right?
I'd really like to see how does contract testing work with multiple inputs, outputs, preconditions and postconditions.
There are so many definitions of E2E out there in the wild. For some it's simply talking to a GUI or API through some automated ways. For some it's a test across the stack for a single application, there is the application to application kind of test but it could even mean an infrastructure test. Heck even a single ping from a client to a server could qualify as an E2E scenario.
I'm glad to see the rest of the world is catching up with simply being aware of one sided contract changes occurring without notice. If the contract is being defined good enough simply tracking the meta changes on the contract should be enough of an indication of things breaking somewhere on the data river across the corporate systems landscape.
Sadly more often the contracts are loosely defined, so open on purpose as it is hard to make a proper contract. And thus things break on garbage being pushed down the funnel or changes made of what means what when and thus causing trouble for the handling of actual work in progress.
As I heard a wise dude say once, the only contract you can trust to never change is math, for the rest is all about your naivety.
Ignoring any 'change' done by a developer you need to regularly 'daily' run automated regression end to end tests - why ?
Uncontrolled Change within the environments by third parties.
1) Your users computers are being continually 'patched' for security and functional changes. (remember patch Tuesday!)
2) Your production systems are being continually 'patched' for security enhancements and network reconfigurations.
3) The browsers that your web applications run on are continually evolving functionally and updates for security (and data gathering!).
Running your tests pick up this 'uncontrolled' but deliberate change to functionality and security in the environment.
Our regression tests have picked up these kind of changes where our software (although unchanged) have been effected by the environment that the system runs on and the parts that run on the customers computers. This allows the developers to make fixes due to the changing environment.
So if we need to run end to end just to keep up with uncontrolled change - we must certainly run end to end when developers make change.
I usually call them acceptance tests and the way I test them is by mocking external services with mocking tools like Wiremock and dockerized infrastructure in order to test the contracts as is mentioned in this video
What about set of microservices owned by one team, communicating through both async messages as well as standard http request-response? Would you do only contract testing described here, treating these microservices as an external system in fact, or would you write full e2e tests, if you control the whole environment where these tests are run ?
It actually works fairly well if you are very diligent about IaC and test data generation, but if you aren't at near 100% automation, you're gonna have a bad time. In fact, our issue is it's so easy people spin up entire environments to test simple things, which is expensive unless everything is serverless.
I think you are way underselling the effort to mock N apis for 'isolated' testing as well as ignoring what it DOESN'T test. You really want to test the 'glue' as well as the 'components' which mocking doesn't really achieve.
When you have IaC mastered you can do things like bring up an environment that is dedicated to doing some long running 'nightly' job that requires several async components chatting through multiple pieces of infra (rest calls, MQs, api gateway, etc). It's hard to be overly confident mocking something like that.
Great response. Mocks are a really flaky concept and the entire take on contract testing is not a fair comparison to E2E. At the end of the day, the user is going to do something and expect something to happen. E2E makes sure the right stuff happens.
Unit/Integration tests with mocks might be useful for maintaining independent functional pieces but how do you know if your entire system is working without tying the real production pieces together? How do you know your mocks are accurately simulating the real external service? Automated tests are a big burden to write and maintain.
What I describe isn't really mocking, as most people mean it. It is system-level mocking through the same interfaces that external components, sub-systems, and external collaborators use. So we are testing the 'glue'.
Well we were in production on a large system with thousands of users for over 13 months before the first defect was noticed by a user, so it's not all that 'flaky'.
@@ContinuousDelivery Would you like to hear my anecdote about how it is?
@@ContinuousDelivery I don't see how that's possible.
If you have 2 components that are coupled via an MQ, mocking both sides isn't enough. You also have to verify all the configs/permissions etc are in place as well. Further, how do you test that those permissions have the desired effect on the system w/o an environment to do so?
Wow...
So i switched jobs recently (Jan 21). I used to work as a software engineer, prior to my current job.
Now i have to "manage" a project that basically shows all the flaws you mention here.
The software is delivered by an external company.
It is so frustrating that even the small things do not work... especially if you know how TDD works and how beneficial it can be.
Yesterday we got a software version and a very basic functionality (logging a string at the beginning of a function) failed / led to an application crash! because of a typo in the log message.
We are all humans, and error can happen. but that this version gets shipped is horrible.
This project lasts almost the whole time that i am at the company. For two reasons: we deploy it on more machines with minor features added, we fix bugs. And yes. bugs like i mentioned before.
I spent my Christmas holiday on implementing the software on my own, completely TDD and i was done in a week. sure it was never tested, but the main features were implemented and tested. technical test coverage 100%.
Management does not really see to replace the external company... and i am already thinking about looking for a new job... since this is sooo frustrating.
I know the question might sound dumb: are there any ideas how to reorganize this?
@Kevin Dietz:
You see you are not alone ;-)
Why not open your own company and put that external company out of business? You already know the customer and the product is finished ;-)
@@Martinit0 It would definitely be an option.
But i know how our company picks suppliers... i would not be one of them... especially if i quit the company and try to work on my own... super emotional picking process.
Do you do your "contract testing" or you isolated test for your system still using a DSL?
My optimal E2E test consists only of Docker container. So that you can run it locally, inside a CI/CD pipeline, maybe even in production in the same configuration.
The image versions allow you to finely control the systems that are otherwise out of your control.
I would also argue NOT to do unit testing. Because most of the time we're not testing algorithms and equations. We're testing if a function call other two functions. No real meaning behind the test case. So what the average developer does is just set the expected to the actual because "he wrote the code and knows that what it's supposed to do". And now if there's a bug, fixing the bug would fail the test case. The test case would lock the bug inside.
what are the tools you use for mocking the downstream and upstream services?
I usually write my own, as part of developing acceptance tests for my code.
I do see some logic in the argument that if you can’t be trusted to test your interfaces how can you end to end test…
Where I do see some things to object about; teams that cross check their software against standards simulators / checkers / verifiers still get pieces of software that don’t fit together. Team A and Team B both get a green light that they correctly implemented the standard and their interfaces are OK. Once they eventually are tested together you learn that their compatible implementation still contain small interpretation differences and are not working together.
On my shopping bag for “good enough testing” I would like software to be deployable in real world with a good egress filter (stupid “I cannot install because I’m not in R&D network” errors) and an automated integration suite that at least show red when the pieces doesn’t fit together.
For some software “don’t do big pieces of muddy software” isn’t a good enough answer. Since the different pieces are standardized in big standards with lots of room for subtle interpretation difference :)
Well it depends on the degree of coupling, and if the coupling is high between these parts then the answer is to increase the scope of the system, with shared code between teams, and the scope of your deployment pipeline to match. Now you all the tests (E2E for your SW) for all the parts of the system together, and release all the pieces together. Define the boundaries by what makes a sensible "releasable unit of software".
If "our system" (system B) was a service that consumed messages, did some work, and published new messages, would you consider system A the consuming, system B "doing the work" and system C producing the messages? Would you mock A and C, tesitng the logic of B given that, or would A and C not being different systems in my example, and instead of considered part of system B too?
No right answer here, it depends, but deciding on what the boundaries of your system are is important. I think that the key indicator, is "what is the scope of deployment?" If you deploy all these things together, they are part of a single system and should be evaluated together, if parts are deployed separately, they are not and are better treated as independent systems, maybe protected by contract tests at the points where they communicate.
Is a QA process a better idea ?
No, not if you mean delegating the testing to QA, here's why: ua-cam.com/video/XhFVtuNDAoM/v-deo.html
There's so much truth in this, but the solution evades me. The complexity can be _immense_.
In my experience with e2e testing, the biggest issue I have seen, is when the _data_ changes - and this data is often _real_ data, not mocked.
An example:
A frontend application which makes a request to an endpoint to get data. It could be a request based on search parameters, as an example.
The issue is that the data is in constant flux - it's being modified, via other software.
A product availability may change, or the name of a product changes - it doesn't matter exactly _what_ that change is, other than it is _change_.
With a great deal of frontend end-to-end testing, you could be using something like selenium.
You want to assert, that when a search is done with specific parameters, you expect some data to be there.
It's so fragile - and if you are implementing smoke tests as part of your CI/CD solution, the build fails when an assert fails - and that could be because that data isn't there _or_ it has changed. The title is no longer "ABC", it is now "XYZ"
The data should _always_ be considered to be in flux. It should be expected it _can_ change.
Yet this type of end-to-end testing doesn't cover that - it blindly expects, that when you do a search for a product with a unique identifier, you will _always_ get the same data back.
So, to get around this, I guess you could employ techniques where the tests connect to _mock data_ - but what if the underlying systems that provide the real data have changed, but your mock data isn't updated to recognise this? - an additional data point has been added or it's type has changed - or any other manner of change has happened. That means you have to find a way to automate the update of mock data.
It's a horror show.
The "solution" often ends up being half-baked - you don't check for _specific_ data, just the placeholder where that data would appear - the _markup_ that surrounds it.
In the case of HTML, you assert that, for example, a heading HTML tag actually has something in it - doesn't matter what, there's just _something_ there.
That's fine, until the entire request fails because the product ID your tests are using is no longer there - and your software has handled this with a 404 or some other graceful fail. The end-to-end tests _don't_ know this. They timeout. The smoke tests fail, nothing gets deployed.
Crazy. Horror show.
What if in a distributed system a certain workflow spreads across multiple services/applications? You still need a way to say if all services together do the right thing in ensemble
If you need to test them all together before release, then they are a "single deployable unit" and so should be version controlled, built, tested and deployed together, and if you are aiming to do a good job, you need an answer to the question "is all this stuff releasable" at least once per day - that's what Continuous Delivery is.
These are great ideas, I wish I could figure out how to implement them with the stuff my team works on. We currently put a huge amount of energy into making full test builds of the product, which our work is only a small component of, installing it on multiple platforms, and doing live testing.
Try testing to the edges of your system, deploy it as if it was running in prod, but fake everything around it. Do this first for a single, simple, but real case, and in parallel with your existing testing strategy, do it as an experiment. The degree to which it is hard to fake at the edges of your system tells you a lot about your system's interfaces. Next try to make it easier to fake the interactions by improving the abstraction at the edges of your system. Build little defensive walls between your system and external systems if necessary (Ports & Adaptors) at some point you will see light at the end of the tunnel, because things will get easier. Now you have software that's easier to test AND software that is better designed.
Do you have a monolith?
@@semosancus5506 It's not that it's a monolith, it's more that the way my team's product interacts with others is highly complex, and there are numerous organizational dysfunctions, such that even though we have thorough and robust automated unit and regression tests on the actual data and logic of our product, every time something unexpected happens, even if it was because of some other piece of the overall architecture, we internalize the need to keep our asses covered at all times. So we continue to keep this team of testers - who would all make great developers IMO - and spend 75% of our capacity each sprint on setting up a dozen test environments and running live tests.
@@transentient complex interactions usually indicate that there is a mismatch between the problem domain and the organization (not quite, but kind of Conway's law). That could also be the source of "ass covering" when problems go wrong. With complex interactions, it allows for teams to hide behind the complexity and run misdirection campaigns. I would look at stepping back, assessing your problem domain and organize around the seams you can discern in that domain. Then give 100% ownership of a domain to a team and take away the ability to hide behind complexity. I have my teams work really hard to define a clear domain boundary and codify it into API with Contracts. Overall teams love it and are much more productive and happy. I can't tell you often I hear "We screwed up" instead of "not our fault". To tie this back to this video and your original comment, If you can possibly do some org structure changes, then you will find the need to spend big on setting up E2E tests will drop. If you need to sell all this to your senior leadership, I would do some ROI computations on spending 75% of your sprint cap on this effort. If the ROI sucks, then you could propose your org try to carve out one well defined bounded context in your problem domain and solidify the API with Contracts. Then measure the number of bugs in these interactions vs. others.
I really like the ideas exposed here and I have thought about them myself, but I really would like to see some references or documentation where the idea is expanded or some other people discuss it; that's because I would like to share some of these ideas to my teamwork, but just saying stuff like: "ey watch, I really like these video ideas" and then sending a UA-cam link just doesn't sound very serious or professional for me
“the code whisperer” may be opinionated but the articles contain references to other materials.
“Some programmers insist on painting a wall from three metres back. I understand why they might prefer it (it seems easier), but I’ll never understand why they consider it more effective than picking up a brush” J. B. Rainsberger
what's the point of testing if my software works under situations I control? I want to know if my software will work out there on the wild, especially its critical business features. Of course testing that deals with coupling problems: my system *IS* coupled to other systems. We should't stop using a tool because it deals with problems it doesn't cause; only it its cons doesn't overcome its pros.
Durability is something I found too many middy-engineers didn't seem to respect at AWS. Too many placed far too great a value in having a large number of tests, and getting there quickly by flipping/adding insignificant variables and verifying the most minor of details didn't change, without thinking of what or why, rather than adding quality tests. So much time wasted fixing tests that tested something they never should have been generating noise and wasting effort, which ultimately makes the system less reliable as it becomes a practice for engineers to save energy and effort by ignoring some failures or not really taking a closer look at them.
hi dave. loved your insights as always. my question to you is if we are doing the sorts of "end" 2 "end" testing so we still need to write large unit tests or are these sort of tests are enough to give us enough confidence to engage ourselves in the contunuoius delivery paradigm. its also good to note the amount of resources that we can spare and the outcome to drive given the time constraint on a project in very small teams as opposed to something you might be used to at Getco or ThoughtWorks 🤔
What about when infrastructure is automated? We can deploy a new throw-away cluster in minutes thanks to Infra-as-code (terraform + argocd or pulumi). Our pipelines can deploy any version we want (any service or all) also in minutes. There is no human intervention. This can even be used for load-tests as a stage in a pipeline if needed or the end-to-end tests.
Obviously automating infrastructure is a good thing, but it doesn't help you to control the variables in terms of the systems that are outside your direct control. Just as a thought experiment imagine testing the internet vs testing just your SW. There is a difference! The bigger the system, the less you have control in your tests. This is a fact, not opinion. "Bigger and more complex" is always harder to test.
@@ContinuousDelivery Not sure what you mean by "SW".
@@jfpinero SW = "Software"
How do you detect that the other software doesn't satisfy the contract anymore?
This has nothing to do with E2E, but what do you think of randomized (unit-) testing, like e.g. with Hypothesis for Python?
Hi Dave, suppose I am using C++ in Visual Studio. What software/tool is used for this in vs?
I stop thinking of e2e testing now because of the video. Not because of it's not important, instead, as a single service for sure it always interacts with the other services, while we should focus on different combination of input and our expected output on the SUT, relative to request-response behavior testing and contract testing. That would be much enough to make ourselves confident when see a green light passed all pre-defined test cases on the pipeline.
As soon as you start using a test rig or stub/mock implementation of external systems that your SUT is interacting with in an E2E test, you practically are rewriting unit tests and is a waste of time.
Do you assume that E2E testing is the only testing ever done? I mean what happens to the use of unit tests, smoke tests, and user acceptance tests? I have never heard that E2E is a catch all for all testing, so if companies a doing this, then they have missed some very significant parts of the why and what testing is performed. I agree with everything you said, except for that part of E2E being the only testing done.
E2E testing is often, wrongly, interpreted by orgs new to automated testing as the easy alternative do doing a proper job of test automation. So it being the only form of testing is not a defining characteristic for e2e, but it is a common usage of it.
@@ContinuousDelivery So then that is a yes, you actually believe that E2E is used as the only testing and further you believe they use E2E as full coverage testing. I agree that is wrong if they are doing it that way. I am just skeptical that this is how testing is being performed with only the use of E2E. I am sure that you can find some companies some place using it like that, but I do not think it is the norm.
@@kennethgee2004I don't think I said it was "the norm" but it is fairly common in big firms. It usually takes the form of them deferring all testing to a QA team. The QA team are now responsible for gatekeeping releases, and are overwhelmed. They read about automated testing, but are in the wrong place to do a really good job, because you need to build testing into the process of development itself, not try to add it as an afterthought. So they attempt to automate, or at least supplement manual release testing with automation, and you end up with over-complex, overly fragile, e2e tests as the on automated testing being done.
Not everywhere, but an extremely common anti-pattern.
@@ContinuousDelivery Possible. I still do not see that a QA testing team is inherently leading to that type of testing. They can just be doing the other testing on a different team. The whole point to QA is to ensure quality. Is that not what a unit test performs? The quality of the tests is not determined by structure or paradigm, but how the testing is performed and on what they test.
@@kennethgee2004 You keep putting words in my mouth. I didn't say "inherently leading to..." I said it is a common anti-pattern, those are not the same things.
I think that QA is often seen in the role of "quality gatekeepers" and I think that is a big mistake. To quote Demming, "You don't inspect quality into a product, you build it in". For software that means that "testing after" is the wrong answer, whoever does it. You don't get to real quality that way. TDD is a LOT more than a QA processes. It changes how we organise dev and how we design code, for the better.
“Manual testing” seems to be “actually trying the damned thing”. What’s really interesting is how many people building (and testing) a product seem to be annoyed at the idea of actually trying it.
I agree, trying it is a really good idea, not always possible, but great if you can. But it is not the same as testing, and not even close to being "enough" for testing most systems.
@@ContinuousDelivery Testing helps us to understand the status of the product through experiencing, exploring and experimenting with it.
@@ContinuousDelivery The false and unhelpful idea that testing can be automated prompts the division of testing into “manual testing” and “automated testing”.
Listen: no other aspect of software development (or indeed of any human social, cognitive, intellectual, critical, analytical, or investigative work) is divided that way. There are no “manual programmers”. There is no “automated research”. Managers don’t manage projects manually, and there is no “automated management”. Doctors may use very powerful and sophisticated tools, but there are no “automated doctors”, nor are there “manual doctors”, and no doctor would accept for one minute being categorized that way.
@@ContinuousDelivery Testing cannot be automated. Period. Certain tasks within and around testing can benefit a lot from tools, but having machinery punch virtual keys and compare product output to specificed output is not more “automated testing” than spell-checking is “automated editing”. Enough of all that, please.
@@ContinuousDelivery Also, I would love it if you did a video about testing with Michael Bolton from developsense, that would be an epic watch/debate between two very smart guys, still I'll dream on!
Is software testing & the practitioners thereof, really so far behind the curve that these kind of courses are actually needed?? IMO, all this particular material achieves is to cover off a test pattern (integration in its many & varied guises) that is well understood & have been widely employed for many decades, outside the software world - yes, there really is one ... in which we all live. That being said, it is encouraging to know that I'm not the only one holding & espousing, indeed evangelizing, this PoV.
Tests don't make quality software, developers do. Developers that have enough time to sit and think. The problem with unit testing for example is your jump right into coding. Most of the time not coding anything but thinking for a while first is much better, to see the edge cases by thinking
What testing qualifications do you hold?
What you say is not the same that "Out of process component tests" + "Contract testing"?
No, it's not the same. Acceptance Tests test that your SW is releasable, that is it can be deployed into production without any more checking. Component tests don't usually say "it's ready to release", they may increase your confidence, but probably will require more work later.