I am an experienced developer and filled SDET roles before. I hadn't thought of the idea of creating prroperties to represent kinds of tests before either. It is really clever. We shouldn't feel bad for not inventing everything. Let's just keep doing our best.
I have an unnatural fondness for unit tests and am always looking for ways to improve and enrich my techniques. So I really enjoyed this talk. I'm still not entirely clear yet on when I would use it but I'm sure that with a bit of practical experience I'd start to understand where and when it would be a superior approach. Thanks!
Robert Martin says (re: TDD) "as the tests get more specific, the implementation gets more generic" - as a workaround to not adding another naive case statement to handle the new test.
Note that these 3 properties do not uniquely define addition: Consider 8-bit bitwise OR vs. 8-bit addition. x OR y = y OR x, x OR (y OR z) = (x OR y) OR z, and x OR 0 = x. However 1 + 1 = 2, but 1 OR 1 = 1.
That's the tricky thing, how do you capture the actual increment from a function without re-implementing it. To me, it seems like the properties provide a very good foundation, but ultimately you'll still want to include plain and simple "examples" to confirm the essence of what you want, in addition to the property testing that's being set. Ultimately most tests that require complexity will always start off with those straightforward example tests, so they're already there. Adding property tests as they're determined will provide additional strength.
you could add in the properties of the numbers being comparable. ex: If you add two integers x + y = z, the following properties must be true: - if x > 0 then z > y, - if x = 0 then z = y, - if x < 0 then z < y. this works if you take comparisons to be a more primitive operation than addition.
That seems like a low-level implementation difference. That's still important to think about but maybe have it separate as a unit test for an u8 unsigned byte test for the `add` operation. At a more high level, the average person working in the domain would expect that addition is just associative, commutative and unital (even without using those words) because that's how addition of all real numbers (and complex ones) works. That way your property-based tests should pass even if the implementation uses a 32-bit float and a signed big int because it matches the spec of a "number". But if it's important to also have tests that care about the low-level bit encoding and assembly level operations, those could be added too.
"Adding 0 is the same as doing nothing". In other words: 0 is the identity value for addition, right? 0 is also the identity value for subtraction, but not for multiplication (1 is the identity value for multiplication)
14:20 EDFH would just perform addition normally. Then if the second value was zero, return the answer. If the second value wasn't zero, multiply the answer by 2 and return it. His goal isn't to write less code that works. His goal is to hand in as much code as possible that doesn't work. The only real solution is to get the EDFH fired and burn down and re-implement every file he ever touched.
Indeed. Property testing relies on the code performing the same actions on the entire testbed, so literally any perfect code can be ruined by the EDFH adding a special case at the top that kills the program whenever one specific input is processed. That's why no amount of testing will ever negate the need for proofreading other people's code.
This will still fail test 1 (Adding 1 twice is the same as adding 2). Let's say x = 10, then x+1 will be 22, then 22 + 1 will be 46. But x+2 will be 24
It's funny that this criticism reveals the actual problem of Wlaschin's examples: arithmetic addition is an operation that is well-studied and its properties are extremely well-established. Which is to say he's correct: the _definition_ of an arithmetic addition operation is that, given two real numbers, the operation obeys the commutative property, associative property, and identity property. Pezsmapatkany's point is that Warp Zone's algorithm fails the associative property test. But Warp Zone's actual conclusion is still correct: they merely chose the wrong example. In the vast majority of real world cases, we do not understand the operation as thoroughly as mathematicians understand arithmetic addition. We do not have a coherent and complete set of properties to draw from to describe all given computations, nevermind the abstractions on top of that like GUIs.
This is very interesting as a mathematical proof, but I am having trouble understanding how to apply it as a tester without a degree in mathematics. I ran into this reimplementation problem while trying to test a calculator on my last project. It proved to be beneficial because having two implementations in 2 different languages developed independently is a rock solid test. It was also a great way to find bugs without having to manually sit down with a calculator to try to figure out expected results. I did have to think up some input values, though. This was a great presentation and gives me a lot to think about.
I dont think 2 different implementations are a rock solid test. If one fails to understand every detail of a requirement, the two implementions will most likely have the some conceptually flaws. But that can also be true, if one just write tests against specific parameters. I am not sure i really think the shown approach is efficient but it can definitly lead to a deeper understanding
Development was born from computer science which was born from (discrete) math which is mainly algebra and logic. Even though none of us can be a master at everything and we all have to start somewhere, I don't believe it's a good habit for us to scoff at the original foundations of our craft. Airplane designers are expected to understand fluid dynamics so why should we not at least try our best to understand the more rigorous parts of good program design. It's true that unfortunately academics obscure these simple principles behind "monad", "commutative" and the like but most terms that we aren't scared of are themselves from math: functions were mathematical first, classes are a mathematical notion too, etc. I believe it's been programmers avoidance of the deep ideas that's lead to many of the problems in code nowadays. That doesn't mean the more pragmatic "just get it done" mindset hasn't also been useful compared to a theory-only approach. But there's a reason most mathematical objects are themselves only described as a tuple of primitive notions with certain properties that always hold: it's the best mix of generality and correctness. So maybe finding a way of making a (programming) class that has one method which only takes itself and another object of the same type and maps it to a new object of its same type in a way that explicitly is associative and commutative is much more useful than having a class with 20 different methods which take 10 different parameters of various types to do crazy things. But at least we can feel good that we're watching videos like this, trying to learn to go back to the mathematical roots to do things correctly.
Great talk. Only thing that I wonder about is at the very end with the facial recognition examples. I don’t think it is given that a facial recognition software would necessarily place the “box” around the face in the exact same way on two images that are otherwise identical but where one has been rotated at an angle. Even with a simple angle like 90, 180 or 270 degrees. And likewise, turning one copy of the image into black and white could probably affect the result too.
How to handle exceptions with this? Like when creating properties for a division operator, we would want to handle division by zero separately. Also, for addition, how to handle arithmetic overflows if the generated numbers are too big?
@Chris Warburton - I disagree that this approach makes you consider the edge cases up front when looking at the overflow problem. If you look at the rules for addition at 14:30 and you’re using an addition implementation which overflows silently (which maybe you didn’t want, but didn’t know you didn’t want it beforehand) you will find all those tests pass. Or even if the implementation spits out garbage when overflowing, but deterministic garbage, it will pass the tests.
Ultimately, you're looking to check that your function (add, divide, more complicated function f) satisfies certain behaviors. This is where knowing advanced math- group theory, type theory, formal math for functions- has practical use for software development. You can literally write "operator+" for your objects that meets the mathematical definition of a group operator, enduring it'll behave in the way users & other code expects from 'addition'. From there you can do similar things like guarantee your "undo" is a true inverse function, or your UUID function is bijective (one to one). And so forth, You use the PROPERTIES a function, object, etc. should have to guide its actual code & then use those same properties to black-box test it in unit tests. An advanced example is class types that form a group or ring. As in, I have objects X that have APIs allowing consumers to transform an X into a different X. By implementing class X and its APIs so that set of 'X' forms a mathematical group (or ring, field, etc.), I guarantee that no sequence of calls to those APIs can fail to generate an invalid 'X' object as an output without lots of wasteful or brittle checks in the production code. And the same formal math gives clear direction on what tests I need to have 100% coverage for this subassembly.
@@zackyezek3760 I agree that learning more discrete math and abstract algebra has really changed how I try to write code. When we're in the imperative mindset, we think of code as a book of recipes for how to bake a data cake. But in the declarative math mindset, it's better to just describe what a cake has and the invariant properties a cake should have and accept whatever result matches those criteria. The radical simplicity allows for less false positive and false negative errors in the outcomes by both being rigorous and general.
@@o.sunsfamily Well, then you would still get a flaky test, so cannot get away with it in the long run, however you're not right. Let's say x = 10, then x+1 will result in 0, then 0+1 will result in 1, and x+2 will result in 0.
I've never liked this term lazy programmer. I prefer efficient system oriented designer and programmer using computer science, structured execution flow, modular programming, OOD/OOP, and more.
EDFH - now I know how to name it. Indeed I've met once such an implementation, in a real production chartplotter software, and it made memories for life for me.
"The EDFH can't create an incorrect implementation!" well... I paused at 19:13 to say it doesn't prevent the misguided dev from hardcoding a giant lookup table or incrementing/decrementing a copy of x, y times... I hope that sometime in the next N minutes you'll say "here's how to define & impose compute/memory constraints".
I remember being confused by PBT, because properties meant "variables within a class", and not "mathematical properties, such as commutativity" to me. Another reason why functional programming is never going to be popular with newbies: it overloads jargon with its own semantics, creating more confusion than clarification. I learned PBT and FP despite its jargon, not because of it.
This is very true! I think it's a form of gatekeeping because people feel smart when they use obscure terms. And let's face it that when you actually describe category theory concepts in simple English, they almost seem trivially obvious. But these core ideas are so useful precisely because they make everything so obvious when applied, and that's a hallmark of good design. I feel like there's a huge need in industry to make academic ideas more palatable to practitioners so that devs stop reinventing wheels with new frameworks and instead get the composition patterns right in the ways that a century of math has evidence for its success. I really appreciate people like Scott Wlaschin for not talking down to us with obscure terminology and focusing on the essential ideas so that we can actually form better coding habits!
What Test would you write to get around the EDFH if they wrote the add function such that if one of your inputs is zero, it returns the other variable, else it returns 0?
While this seems like a good idea for dealing with actively malicious coders, or for testing library code and really mission critical rocket control systems etc. It does seem really hard to write for any code that is just slightly more complex than the trivial examples. Which to be fair is also a problem with with writing good example based tests that actually test something useful. I may be sligtly biased since I'm promarily a frontend coder; and in frontend code the hard logic is trivial but the real issues are basically untestable. Frontend code is "correct" when it looks good, feels good and is understandable to a human; so the only way to test that is to have an actual human test it. I almost never write unit tests/automated test (as in test code that inputs data and expects things about the output); I do write a lot of code in order to test things though; and that code often use random generated inputs as well: but that's mostly code that generate synthetic content to test how it looks; explicitly imput perverse combinations or extreme amounts of data just to se how it behaves at the limits, but this code almost never does any verification itself; i just open the app and look at it and interacts with it. And later remove the test code once I'm satisfied. The only times I encounter classical test code is when I'm touching the (mostly backend) code that others have written. And usually the only reason I think about the tests is because some minor change I make to the code causes lots of tests to fail; or because I've discovered bizarre bugs in code that is supposedly "covered" by test code; or because I wonder how on earth they managed to write tests that just randomly fails 30% of the time you run them in the CI server even if the code it's testing is unchanged (i.e. simply re-running the test will succeed 70% of the time). Usually when I break tests its because the tests it's not because I have introduced a bug; often I have fixed a bug causing the test to fail since the asshole who wrote the test probably changed the test so it expected the buggy output rather than the correct output; or more infuriatingly it breaks because the test uses "mocks" that expect the code to work exactly as implemented so if I fix the code by moving an expensive function call outside the loop (thus making it way faster without changing the result) the so called test fails because it "expected expensiveCalculation(fixedInput) to be called 30 times but got 1 calls". Sometimes I try to add a test to catch a known bug in code that has lots of tests that are neither readable, understandable or able to catch the actual bugs that do exist in the code it's supposed to test; and not only does my new test fail but it also causes dozens of other tests to suddenly fail because the test code is leaving crap in the database between tests and expects that that exact crap to be there later so if I reset the test database it fails and if I insert something it fails because then there is more elements in the table than a dozen of tests expect; and even if I carfully try to manually remove the added elements it still might fail because this does not reset the primary key conter so maybe some later test bizarrely expects a primary key to be 2 and gets 3! And trying to fix the tests by destroying and re-initializing (and manually fixing all the tests that expected side-effects from previous tests) makes the tests 10 times slower since apparently initializing the database has a fixed cost of 20 seconds each time (which is probably why they didn't do that in the first place).
You might say that unit tests should not involve the database; but then what use are the test then? Most of the software I've worked with has fairly complex frontend code that is not really testable with typical unit tests, and some backend that usually mostly just forwards the requests or data between the database and the frontend and occasionally forwards requests or data to a sub system. In most sensible projects the backend is really trivial and generally doesn't have obscure bugs; it either works or completely fails to compile and deploy. In one of the more insane projects I've worked on the all parts of the code is an unholy mess of ugly code forwarding requests and data back and forth though endless layers of abstractions and buzzword tecnologies up and down between needlessly many subsystems and even horizontally between subsystems, some of which is node servers using non-relational NoSQL databases (because that's "new and cool") for storing relational data half of which is stored in a different database perhaps written in go (because that's also a new and cool language) using a good relational database to store big blobs of non-relational data (because of course) half of which is partially duplicated in a different subservice called "statistics" also written in go and using a separate relational database with string fields implicitly referencing primary keys in a different database (but in a completely different style) that apparently was created in order to "improve performance" by moving some of the heavy statistics queries away from the "core" system. Never mind that the primary purpose of both the mobile app and the dashboard (the two clients of the backend system) was to register or display statistics meaning almost every frequently called query needed data from the two isolated postgres databases and relational-data from the non-relational database, so it had to make multiple requests from the api-server down to the sub-servers and then iteratively merge this data in the api server and of course cache this result in an unreliable caching system otherwise it would be unusably slow, causing all sorts of weird cache invalidation issues. (Also of course every service, parameter, table, field in this system used incredible generic names like "Group" or "Item" and lots of 1 letter variable names repeated all over the 10s of subsystems so it's also impossible to find anything by grepping since almost anything you look for will match hundreds of lines in every subsystem.) Anyway besides the horrible structural issues with this system (mainly that most requests required "joins" between isolated databases); almost all the actual bugs (as in wrong rather than just slow behaviour) originated in horribly complex queries involving datefields. All the incorrect queries had "unit tests" that failed to identify the bugs since the synthetic data used in the test was completely different from the data the actual code would insert. One of the tests did in fact occasionally fail though because of one of the actual bugs: because it used system time +/- some fixed offsets for timestamps in the test data rather than fixed time and one of the bugs in the buggy code was that it tried to split the dates into hour long buckets (apparently for "performance" or something) some of the queries reading the data used date
I think a lot of the problems you've experienced are more symptoms of wider problems that pervaded into the tests. Also,, it sounds like the tests were written after the fact to verify implementation details that miss the point of what the code is intended to (e.g. the 30 method calls, at the very least there should have been a comment explaining the significanceof the number and if there is no significanceit shouldn't be tested). While I agree that testing is no panacea, it's absolutely a prerequisite for the success of complex systems, but just like anything else if done poorly they can become more of a liability then a benefit.
@@SteinGauslaaStrindhaug Unit tests should not involve the database because you are no longer testing a unit. When you start involving the database you move into integration testing territory. In the story you gave us the problem could have been caught using unit tests for the database queries themselves.
@@gyroninjamodder I don't care what you call the test, "unit" or "integration" whatever. But if a function more or less wrap a big SQL query, how do you test that function in any meaningful way without involving the database? If you mock/fake the database you're only testing that the programming language is able to call another function, which you have to assume is working anyway to write tests at all. Are you saying there is a way to unit test a SQL query without using a database? How does that work?
@@SteinGauslaaStrindhaug You test the SQL queries alone. Instead of testing a function which calls the database then does something with the results you have your test directly invoke the query and check the results. Doing it this way means that you will see that a unit test for a specific query is failing instead of a test touching a lot more logic.
At least for addition, you can turn a specific example into a test suite, e.g. given adding 1 and 3 yields 4: [] let ''Adding 1 and 3 to a number is the same as adding 4 to it''()= for _ in [1..100] do let x = randInt() let result = add(3, add(x, 1)) let result2 = add(1, add(x, 3)) let result3 = add(x, 4) Assert.AreEqual(result, result2) Assert.AreEqual(result, result3)
@@rainbowevil If add is called as add(add(x, 1), 1), then implementing x + 2y instead of x + y would pass that test. If add(1, add(1, x)) were used instead, implementing 2x + y would pass. I meant to create a single test set that could rule out a lot of incorrect implementations, hence scrambling the parameter order a bit.
"if you sort a collection, the size should be the same" This guy never heard about GulagSort: check if the items are sorted, and any item that isn't, gets eliminated (from the top of my mind). 😂
def add(x, y): if isinstance(x, int) or isinstance(y, int): return x + y return 0 # fails for x and y both floats :) we should add tests for several input types and do not forget the properties of + over the reals at least ua-cam.com/video/IYzDFHx6QPY/v-deo.html
Nobody gonna point out that the thumbnail falsely uses the apostrophy twice and in the title he corrected it just once?! It's "programmers" and "tests" - just simple plural.
"Programmer's" is correct. It's intended to be possessive, not plural. This is "a guide for the lazy programmer", just like the Hitchhiker's Guide to the Galaxy is a guide intended for a hitchhiker. "1000's", on the other hand, is more controversial: some style guides say to write it that way, and others say to omit the apostrophe because it's a simple plural rather than a singular possessive. Also, I don't see "test's" anywhere, but maybe the thumbnail was fixed before I arrived. That seems likely if he fixed it in the title first.
As a tester i find this as a nice brain exercise for the developer and a waste of resources for everyone else because the end result of this does not help the organization in any way. At least the intersection of unit tests across multiple functions gives something palpable. The output of such a test needs to be re-structured in order to be presented somewhere in a human readable way. Also I truly hope that the Shrinker's algorithm was described as such for the sake of ease of understanding. Because else it's written in a very bad way. It can be solved by generating random numbers between last non failing and first failing. That's how boundaries are tested. Then, its highly fake positives prone for implementations that have 1 failure in a range of values. Let's say it should only fail for value 31. Statistically the chances of getting a random for 31 out of 1..100 are... not quite big. The output of the test result is not very useful. The fact that it can be falsifiable combined with the fact that it failed after 23 tests combined with the fact that it needed 3 shrinks == "Fails for inputs bigger than 81." . The fact that i took 3 shrinks cannot be reliably useful as a risk or likelihood assessment because it's random. With a different combination it can return 4 or 5 at worst (based on your description of the algorithm). So there's a big difference in the confidence factor between 33% and 20%. The number of tests again, varies. At best I can read it as "how difficult was it for the Shrinker to find the boundary". In conclusion, for me your presentation was good. Not the greatest sound but you did speak very clearly and you did explain simple enough to not have me re-watch it. Thank you for your time and please accept my like. The idea is acceptable for some use-cases but in almost every instance the unit tests are better. Usually at QA generators and builders for worst cases are used for a very long time without outputs more optimized. Almost every acceptable testing framework has one and those boundaries are found in the specifications of the feature. The absolute only case when I find this being better than everything else is if I receive a blackbox which I don't know what and how it should do it. Then, yes, this method of discovery would be useful. Gladly I have never encountered this. Also gladly ISTQB covers all these types in depth. And for those special strings I do have a list of them that I do paste with scripts in whatever input. This tool has literally 0% chance of finding something I will not find in the first hour. P.S. I am not a gherkin fan.
I think that he should have covered a more complex example of PBT (property based testing) but as he said, it is difficult to fit it on a slide. Quickcheck and most production ready property testing libraries are quite sophisticated in generating and shrinking. The shrinking example he gave was more for easy explanation of what a shrinker does. Also, do not forget that a computer is running these tests, not a human. That means that millions of test cases can be generated and shrunk in a second. It does not replace human testing or completely replace unit tests. Unit tests are useful especially for testing known regressions and what the developer thinks are edge cases. PBT is just another tool in your arsenal to be more confident in the correctness of your code. Another interesting way of testing similar to PBT is fuzzing which has uncovered a lot of bugs in many opensource libraries. Check out this post : begriffs.com/posts/2017-01-14-design-use-quickcheck.html for more information on how quickcheck (Haskell) works.
@@espeon91 after reading your comment I think I can pinpoint why I don't trust it: Not all inputs of a function can exist by themselves. In case of an add function, the random integer generator is great. In case of a stock management function the integer generator is almost useless. In the later the properties are the business logic. The output of the function is tied to the input in a way that it actually matters which stock goes to which product and in which order. I can very well imagine how I can use PBT in that context and it's borderline to TDD or data driven testing. So borderline that i'd argue that any other practice except PBT brings more advantages. Having the shitty spreadsheet of data driven testing is useful for linking the testing of this function to another. Having the TDD helps with the documentation. Having unit tests stands in between the two.
@@CliseruGabriel Which is exactly why not to use just the primitive int generator. PBT works with complex data types as mentioned in the video. So you will not be testing a list of ints but a type like StcokPortfolio which will have suitable range of acceptable values and properties. Using a method in a sub-optimal way does not mean the method is useless.
@@chriswarburton4296 which if put into words like you just wrote to me gives us a test suite which doubles down as documentation. By reading them even if I have no clue about the project it totally makes sense to me (kudos btw). But put into the framework presented in the video gives "not much". And then, if we take the framework presented and add words to it is almost like taking Gherkin and adding random values to it. Which is almost like a dynamical version of data driven testing. Which is almost like having a bunch of well written unit-tests. This is my "argument" against the presented framework. It doesn't bring much to the table because all the other approaches end up testing the properties. Even if they admit it or not. They are different representations of how "invent" a set of values for which a logic must hold true. And the presented approach solves the problem of a very bad actor while taking away from readability. We can solve the same problem of the bad actor by adding random generation to the existing approaches while keeping the documentation part and have a more human usable result.
Tests don't work. Never have, never will. They simply represent the programmer's imagination of what could possibly go wrong with the software rather than the reality of what actually will go wrong. We have half a century of experience with serious bugs that have bypassed any and all test benches. End of story.
Yes, but it is that in any module arithmetic greater than five, as well... so in reality you won't even pass a char vs. int vs long error check with these kindergarten games. This is simply a man talking who is so stupid that he can't even tell just how stupid he is. ;-)
So your manager decides you need to improve quality so he hires one of them friendly neighborhood consultants. So they guy of course tells you that you don't have enough unit tests. And he decides to refractor your add function that has been working for years to show you small brain plebs how a big brain would do it. After he is done the number of tests has grown 2000 times, he took two weeks to do it but he is paid by the hour so really it was a waste of his time more than anything. A mount later someone report's a bug, you investigate and you find that your add function changed from decimal to double, the tests took way too long to run, I mean really you are testing every possible combination now so it started getting slow. You ask the guy writing integration tests why didn't he write a test to test this, he tells you that the big brain told him no need to do it any more edge cases are covered by unit tests now and that he should focus on things that have real value to the user.
If you always return x, then add(x,y) is not equal to add(y,x) unless x and y are the same. It fails the very first test (commutativity, like the squirrel said).
EDFH isn't always evil, stupid, or lazy. Often they are work for a boss who constantly screams "get it done and move on to the next feature."
Group theory and finite fields being applied to the add example. Brilliant!
I'm a noob programmer and I haven't thought of testing this way. Thanks
I am an experienced developer and filled SDET roles before. I hadn't thought of the idea of creating prroperties to represent kinds of tests before either. It is really clever. We shouldn't feel bad for not inventing everything. Let's just keep doing our best.
I have an unnatural fondness for unit tests and am always looking for ways to improve and enrich my techniques. So I really enjoyed this talk. I'm still not entirely clear yet on when I would use it but I'm sure that with a bit of practical experience I'd start to understand where and when it would be a superior approach. Thanks!
A fantastic lecture with lots of food for thought. Let the feast commence. Thanks so much.
Robert Martin says (re: TDD) "as the tests get more specific, the implementation gets more generic" - as a workaround to not adding another naive case statement to handle the new test.
This have actually answer most of my questions about testing, so I welcome this. It might raise some more questions later tought...
Thanks a lot for this talk! I learned a ton and I have never written a line of F#.
Note that these 3 properties do not uniquely define addition: Consider 8-bit bitwise OR vs. 8-bit addition.
x OR y = y OR x, x OR (y OR z) = (x OR y) OR z, and x OR 0 = x. However 1 + 1 = 2, but 1 OR 1 = 1.
That's the tricky thing, how do you capture the actual increment from a function without re-implementing it. To me, it seems like the properties provide a very good foundation, but ultimately you'll still want to include plain and simple "examples" to confirm the essence of what you want, in addition to the property testing that's being set. Ultimately most tests that require complexity will always start off with those straightforward example tests, so they're already there. Adding property tests as they're determined will provide additional strength.
you could add in the properties of the numbers being comparable. ex: If you add two integers x + y = z, the following properties must be true:
- if x > 0 then z > y,
- if x = 0 then z = y,
- if x < 0 then z < y.
this works if you take comparisons to be a more primitive operation than addition.
That seems like a low-level implementation difference. That's still important to think about but maybe have it separate as a unit test for an u8 unsigned byte test for the `add` operation. At a more high level, the average person working in the domain would expect that addition is just associative, commutative and unital (even without using those words) because that's how addition of all real numbers (and complex ones) works. That way your property-based tests should pass even if the implementation uses a 32-bit float and a signed big int because it matches the spec of a "number". But if it's important to also have tests that care about the low-level bit encoding and assembly level operations, those could be added too.
"Adding 0 is the same as doing nothing". In other words: 0 is the identity value for addition, right? 0 is also the identity value for subtraction, but not for multiplication (1 is the identity value for multiplication)
14:20 The EDFH says
let add (x,y) =
-1 * (-1 * y - x)
This is actually correct, isn't it? The implementation is kind of stupid, but still valid.
Isn't this just addition with extra steps?
This is a valid implementation, just not very "clean", but tests can't test for that haha
@@entcraft44 Any incorrect implementation will fail at least one test that a correct one passes, so this is the EDFH's final answer.
Or just return zero ... always
what if the EDFH said (in python)
def add(a,b):
if a==0: return b
if b==0: return a
return 2
assert add(x, -x) == 0
@@wojciechwal2953 def add(a,b):
if a==0: return b
if b==0: return a
return 0
associativity fails though: (1+2)+3 = 0+3=3 while 1+(2+3)=1+0=1
The audio is a little bad, but found it informative nonetheless. Thanks for the upload
typical online conference, I guess. I'm surprised there were no cat or baby sounds in the background
@@reformed_attempt_1 I'm pretty tolerant of household noises in the background at this point. We're all in this together!
@@jameshoiby we are? Did you get thst from MSM? Covid 19 appears to ignore BLM protesters too.
Not everyone has good audio recording gear or space in their house
@@Vlfkfnejisjejrjtjrie he got it from not living with his head stuck in his ass
This totally sounded like something I'd see in Haskell and OCaml, wasn't disappointed.
I'm sure QuickCheck was the first library that implemented PBT, which was written in Haskell. So yeah.
14:20 EDFH would just perform addition normally. Then if the second value was zero, return the answer. If the second value wasn't zero, multiply the answer by 2 and return it. His goal isn't to write less code that works. His goal is to hand in as much code as possible that doesn't work.
The only real solution is to get the EDFH fired and burn down and re-implement every file he ever touched.
Indeed. Property testing relies on the code performing the same actions on the entire testbed, so literally any perfect code can be ruined by the EDFH adding a special case at the top that kills the program whenever one specific input is processed. That's why no amount of testing will ever negate the need for proofreading other people's code.
This will still fail test 1 (Adding 1 twice is the same as adding 2).
Let's say x = 10, then x+1 will be 22, then 22 + 1 will be 46. But x+2 will be 24
@@Pezsmapatkany only for that particular input value. there are 100 different variants that must all pass
It's funny that this criticism reveals the actual problem of Wlaschin's examples: arithmetic addition is an operation that is well-studied and its properties are extremely well-established. Which is to say he's correct: the _definition_ of an arithmetic addition operation is that, given two real numbers, the operation obeys the commutative property, associative property, and identity property.
Pezsmapatkany's point is that Warp Zone's algorithm fails the associative property test. But Warp Zone's actual conclusion is still correct: they merely chose the wrong example. In the vast majority of real world cases, we do not understand the operation as thoroughly as mathematicians understand arithmetic addition. We do not have a coherent and complete set of properties to draw from to describe all given computations, nevermind the abstractions on top of that like GUIs.
Amazing stuff! Watching this for the fifth time to absorb it all
This is very interesting as a mathematical proof, but I am having trouble understanding how to apply it as a tester without a degree in mathematics. I ran into this reimplementation problem while trying to test a calculator on my last project. It proved to be beneficial because having two implementations in 2 different languages developed independently is a rock solid test. It was also a great way to find bugs without having to manually sit down with a calculator to try to figure out expected results. I did have to think up some input values, though.
This was a great presentation and gives me a lot to think about.
I dont think 2 different implementations are a rock solid test. If one fails to understand every detail of a requirement, the two implementions will most likely have the some conceptually flaws. But that can also be true, if one just write tests against specific parameters. I am not sure i really think the shown approach is efficient but it can definitly lead to a deeper understanding
Development was born from computer science which was born from (discrete) math which is mainly algebra and logic. Even though none of us can be a master at everything and we all have to start somewhere, I don't believe it's a good habit for us to scoff at the original foundations of our craft. Airplane designers are expected to understand fluid dynamics so why should we not at least try our best to understand the more rigorous parts of good program design. It's true that unfortunately academics obscure these simple principles behind "monad", "commutative" and the like but most terms that we aren't scared of are themselves from math: functions were mathematical first, classes are a mathematical notion too, etc. I believe it's been programmers avoidance of the deep ideas that's lead to many of the problems in code nowadays. That doesn't mean the more pragmatic "just get it done" mindset hasn't also been useful compared to a theory-only approach. But there's a reason most mathematical objects are themselves only described as a tuple of primitive notions with certain properties that always hold: it's the best mix of generality and correctness. So maybe finding a way of making a (programming) class that has one method which only takes itself and another object of the same type and maps it to a new object of its same type in a way that explicitly is associative and commutative is much more useful than having a class with 20 different methods which take 10 different parameters of various types to do crazy things. But at least we can feel good that we're watching videos like this, trying to learn to go back to the mathematical roots to do things correctly.
Great talk. Only thing that I wonder about is at the very end with the facial recognition examples. I don’t think it is given that a facial recognition software would necessarily place the “box” around the face in the exact same way on two images that are otherwise identical but where one has been rotated at an angle. Even with a simple angle like 90, 180 or 270 degrees. And likewise, turning one copy of the image into black and white could probably affect the result too.
How to handle exceptions with this? Like when creating properties for a division operator, we would want to handle division by zero separately. Also, for addition, how to handle arithmetic overflows if the generated numbers are too big?
@Chris Warburton - I disagree that this approach makes you consider the edge cases up front when looking at the overflow problem. If you look at the rules for addition at 14:30 and you’re using an addition implementation which overflows silently (which maybe you didn’t want, but didn’t know you didn’t want it beforehand) you will find all those tests pass. Or even if the implementation spits out garbage when overflowing, but deterministic garbage, it will pass the tests.
Ultimately, you're looking to check that your function (add, divide, more complicated function f) satisfies certain behaviors.
This is where knowing advanced math- group theory, type theory, formal math for functions- has practical use for software development. You can literally write "operator+" for your objects that meets the mathematical definition of a group operator, enduring it'll behave in the way users & other code expects from 'addition'. From there you can do similar things like guarantee your "undo" is a true inverse function, or your UUID function is bijective (one to one). And so forth, You use the PROPERTIES a function, object, etc. should have to guide its actual code & then use those same properties to black-box test it in unit tests.
An advanced example is class types that form a group or ring. As in, I have objects X that have APIs allowing consumers to transform an X into a different X. By implementing class X and its APIs so that set of 'X' forms a mathematical group (or ring, field, etc.), I guarantee that no sequence of calls to those APIs can fail to generate an invalid 'X' object as an output without lots of wasteful or brittle checks in the production code. And the same formal math gives clear direction on what tests I need to have 100% coverage for this subassembly.
@@zackyezek3760 I agree that learning more discrete math and abstract algebra has really changed how I try to write code. When we're in the imperative mindset, we think of code as a book of recipes for how to bake a data cake. But in the declarative math mindset, it's better to just describe what a cake has and the invariant properties a cake should have and accept whatever result matches those criteria. The radical simplicity allows for less false positive and false negative errors in the outcomes by both being rigorous and general.
The solution presented at 13:00 still passes all tests listed at 14:30(return 0)
No, the 3rd test will fail (Adding zero is the same as doing nothing), the return value should be x and not 0
@@Pezsmapatkany but you can just add two if statments like so:
int add(int x, int y){
if(x == 0)
return y;
if(y == 0)
return x;
return 0
}
@@Subject38 Yeah, but that will fail for the 1st test (Adding 1 twice is the same as adding 2)
@@Pezsmapatkany only if the test tries 0+1+1 and 0+2. If it's a random test, it's not that unlikely that 0 won't come up.
@@o.sunsfamily Well, then you would still get a flaky test, so cannot get away with it in the long run, however you're not right.
Let's say x = 10, then x+1 will result in 0, then 0+1 will result in 1, and x+2 will result in 0.
I've never liked this term lazy programmer. I prefer efficient system oriented designer and programmer using computer science, structured execution flow, modular programming, OOD/OOP, and more.
Lazy programmer is way shorter.
@@jeffwells641 also more efficient
I like it, because it is an oxymoron. In order to become lazy you first need to do more work than most.
Really good examples... In my head trying to avoid you own implementation of the method to test the method... This is the answer... Wicked
EDFH - now I know how to name it.
Indeed I've met once such an implementation, in a real production chartplotter software, and it made memories for life for me.
"The EDFH can't create an incorrect implementation!" well... I paused at 19:13 to say it doesn't prevent the misguided dev from hardcoding a giant lookup table or incrementing/decrementing a copy of x, y times... I hope that sometime in the next N minutes you'll say "here's how to define & impose compute/memory constraints".
He wouldn’t be lazy :)
I remember being confused by PBT, because properties meant "variables within a class", and not "mathematical properties, such as commutativity" to me.
Another reason why functional programming is never going to be popular with newbies: it overloads jargon with its own semantics, creating more confusion than clarification. I learned PBT and FP despite its jargon, not because of it.
This is very true! I think it's a form of gatekeeping because people feel smart when they use obscure terms. And let's face it that when you actually describe category theory concepts in simple English, they almost seem trivially obvious. But these core ideas are so useful precisely because they make everything so obvious when applied, and that's a hallmark of good design. I feel like there's a huge need in industry to make academic ideas more palatable to practitioners so that devs stop reinventing wheels with new frameworks and instead get the composition patterns right in the ways that a century of math has evidence for its success. I really appreciate people like Scott Wlaschin for not talking down to us with obscure terminology and focusing on the essential ideas so that we can actually form better coding habits!
What Test would you write to get around the EDFH if they wrote the add function such that if one of your inputs is zero, it returns the other variable, else it returns 0?
Associativity
In part V when you mention model based testing, is it basically just making an oracle as you mentioned earlier?
Thanks, this was well done.
Great talk. Thank you
While this seems like a good idea for dealing with actively malicious coders, or for testing library code and really mission critical rocket control systems etc. It does seem really hard to write for any code that is just slightly more complex than the trivial examples. Which to be fair is also a problem with with writing good example based tests that actually test something useful.
I may be sligtly biased since I'm promarily a frontend coder; and in frontend code the hard logic is trivial but the real issues are basically untestable. Frontend code is "correct" when it looks good, feels good and is understandable to a human; so the only way to test that is to have an actual human test it. I almost never write unit tests/automated test (as in test code that inputs data and expects things about the output); I do write a lot of code in order to test things though; and that code often use random generated inputs as well: but that's mostly code that generate synthetic content to test how it looks; explicitly imput perverse combinations or extreme amounts of data just to se how it behaves at the limits, but this code almost never does any verification itself; i just open the app and look at it and interacts with it. And later remove the test code once I'm satisfied.
The only times I encounter classical test code is when I'm touching the (mostly backend) code that others have written. And usually the only reason I think about the tests is because some minor change I make to the code causes lots of tests to fail; or because I've discovered bizarre bugs in code that is supposedly "covered" by test code; or because I wonder how on earth they managed to write tests that just randomly fails 30% of the time you run them in the CI server even if the code it's testing is unchanged (i.e. simply re-running the test will succeed 70% of the time).
Usually when I break tests its because the tests it's not because I have introduced a bug; often I have fixed a bug causing the test to fail since the asshole who wrote the test probably changed the test so it expected the buggy output rather than the correct output; or more infuriatingly it breaks because the test uses "mocks" that expect the code to work exactly as implemented so if I fix the code by moving an expensive function call outside the loop (thus making it way faster without changing the result) the so called test fails because it "expected expensiveCalculation(fixedInput) to be called 30 times but got 1 calls". Sometimes I try to add a test to catch a known bug in code that has lots of tests that are neither readable, understandable or able to catch the actual bugs that do exist in the code it's supposed to test; and not only does my new test fail but it also causes dozens of other tests to suddenly fail because the test code is leaving crap in the database between tests and expects that that exact crap to be there later so if I reset the test database it fails and if I insert something it fails because then there is more elements in the table than a dozen of tests expect; and even if I carfully try to manually remove the added elements it still might fail because this does not reset the primary key conter so maybe some later test bizarrely expects a primary key to be 2 and gets 3! And trying to fix the tests by destroying and re-initializing (and manually fixing all the tests that expected side-effects from previous tests) makes the tests 10 times slower since apparently initializing the database has a fixed cost of 20 seconds each time (which is probably why they didn't do that in the first place).
You might say that unit tests should not involve the database; but then what use are the test then?
Most of the software I've worked with has fairly complex frontend code that is not really testable with typical unit tests, and some backend that usually mostly just forwards the requests or data between the database and the frontend and occasionally forwards requests or data to a sub system. In most sensible projects the backend is really trivial and generally doesn't have obscure bugs; it either works or completely fails to compile and deploy.
In one of the more insane projects I've worked on the all parts of the code is an unholy mess of ugly code forwarding requests and data back and forth though endless layers of abstractions and buzzword tecnologies up and down between needlessly many subsystems and even horizontally between subsystems, some of which is node servers using non-relational NoSQL databases (because that's "new and cool") for storing relational data half of which is stored in a different database perhaps written in go (because that's also a new and cool language) using a good relational database to store big blobs of non-relational data (because of course) half of which is partially duplicated in a different subservice called "statistics" also written in go and using a separate relational database with string fields implicitly referencing primary keys in a different database (but in a completely different style) that apparently was created in order to "improve performance" by moving some of the heavy statistics queries away from the "core" system. Never mind that the primary purpose of both the mobile app and the dashboard (the two clients of the backend system) was to register or display statistics meaning almost every frequently called query needed data from the two isolated postgres databases and relational-data from the non-relational database, so it had to make multiple requests from the api-server down to the sub-servers and then iteratively merge this data in the api server and of course cache this result in an unreliable caching system otherwise it would be unusably slow, causing all sorts of weird cache invalidation issues. (Also of course every service, parameter, table, field in this system used incredible generic names like "Group" or "Item" and lots of 1 letter variable names repeated all over the 10s of subsystems so it's also impossible to find anything by grepping since almost anything you look for will match hundreds of lines in every subsystem.)
Anyway besides the horrible structural issues with this system (mainly that most requests required "joins" between isolated databases); almost all the actual bugs (as in wrong rather than just slow behaviour) originated in horribly complex queries involving datefields. All the incorrect queries had "unit tests" that failed to identify the bugs since the synthetic data used in the test was completely different from the data the actual code would insert. One of the tests did in fact occasionally fail though because of one of the actual bugs: because it used system time +/- some fixed offsets for timestamps in the test data rather than fixed time and one of the bugs in the buggy code was that it tried to split the dates into hour long buckets (apparently for "performance" or something) some of the queries reading the data used date
I think a lot of the problems you've experienced are more symptoms of wider problems that pervaded into the tests. Also,, it sounds like the tests were written after the fact to verify implementation details that miss the point of what the code is intended to (e.g. the 30 method calls, at the very least there should have been a comment explaining the significanceof the number and if there is no significanceit shouldn't be tested). While I agree that testing is no panacea, it's absolutely a prerequisite for the success of complex systems, but just like anything else if done poorly they can become more of a liability then a benefit.
@@SteinGauslaaStrindhaug Unit tests should not involve the database because you are no longer testing a unit. When you start involving the database you move into integration testing territory. In the story you gave us the problem could have been caught using unit tests for the database queries themselves.
@@gyroninjamodder I don't care what you call the test, "unit" or "integration" whatever. But if a function more or less wrap a big SQL query, how do you test that function in any meaningful way without involving the database?
If you mock/fake the database you're only testing that the programming language is able to call another function, which you have to assume is working anyway to write tests at all.
Are you saying there is a way to unit test a SQL query without using a database? How does that work?
@@SteinGauslaaStrindhaug You test the SQL queries alone. Instead of testing a function which calls the database then does something with the results you have your test directly invoke the query and check the results. Doing it this way means that you will see that a unit test for a specific query is failing instead of a test touching a lot more logic.
Another option to force the EDFH is use the inverse of addition, eg, x - -y == add(x,y)
That's just an implementation, the EDFH will just copy "x - -y" from the test. So all you're checking is that x - -y is equal to x - -y
Wow!! Very helpful!Thanks for the upload!! hehe
Love the anecdote...
This is how I imagine GPT-3 based programs would look like
You were spot on. haha
Tdd should make code more and more general, not more and more specific, so 5:30 definitely not tdd, just doing things stupidly
how is tdd meant to improve generality? (not saying you're wrong)
And that leads to Test-Driven Development.
great talk :PPPP loved the content very useful
At least for addition, you can turn a specific example into a test suite, e.g. given adding 1 and 3 yields 4:
[]
let ''Adding 1 and 3 to a number is the same as adding 4 to it''()=
for _ in [1..100] do
let x = randInt()
let result = add(3, add(x, 1))
let result2 = add(1, add(x, 3))
let result3 = add(x, 4)
Assert.AreEqual(result, result2)
Assert.AreEqual(result, result3)
Was this not covered by the associativity test he showed? Where adding 1 twice is the same as adding 2?
@@rainbowevil If add is called as add(add(x, 1), 1), then implementing x + 2y instead of x + y would pass that test. If add(1, add(1, x)) were used instead, implementing 2x + y would pass. I meant to create a single test set that could rule out a lot of incorrect implementations, hence scrambling the parameter order a bit.
"if you sort a collection, the size should be the same"
This guy never heard about GulagSort: check if the items are sorted, and any item that isn't, gets eliminated (from the top of my mind). 😂
if(x
RandInt() doesn't return a value between int.MIN_VALUE and 100 inclusive, so this test would fail if one of the random numbers is out of range.
def add(x, y):
if isinstance(x, int) or isinstance(y, int):
return x + y
return 0 # fails for x and y both floats :)
we should add tests for several input types and do not forget the properties of + over the reals at least
ua-cam.com/video/IYzDFHx6QPY/v-deo.html
enjoyed this 1
f(x, y) = f(y, x);
f(f(x, y), z) = f(x, f(y, z));
f(x, 0) = x;
Do these requirements specify only the operation add?
Nope, bitwise XOR, AND and OR also fit.
@@kvdveer f(x , 0) = x doesn't hold for bitwise AND
Maybe chuck in an f(x, -x) = 0 too
Nobody gonna point out that the thumbnail falsely uses the apostrophy twice and in the title he corrected it just once?! It's "programmers" and "tests" - just simple plural.
"Programmer's" is correct. It's intended to be possessive, not plural. This is "a guide for the lazy programmer", just like the Hitchhiker's Guide to the Galaxy is a guide intended for a hitchhiker. "1000's", on the other hand, is more controversial: some style guides say to write it that way, and others say to omit the apostrophe because it's a simple plural rather than a singular possessive.
Also, I don't see "test's" anywhere, but maybe the thumbnail was fixed before I arrived. That seems likely if he fixed it in the title first.
Its funny there's a E in EDFH.
As a tester i find this as a nice brain exercise for the developer and a waste of resources for everyone else because the end result of this does not help the organization in any way. At least the intersection of unit tests across multiple functions gives something palpable. The output of such a test needs to be re-structured in order to be presented somewhere in a human readable way.
Also I truly hope that the Shrinker's algorithm was described as such for the sake of ease of understanding. Because else it's written in a very bad way. It can be solved by generating random numbers between last non failing and first failing. That's how boundaries are tested.
Then, its highly fake positives prone for implementations that have 1 failure in a range of values. Let's say it should only fail for value 31. Statistically the chances of getting a random for 31 out of 1..100 are... not quite big.
The output of the test result is not very useful. The fact that it can be falsifiable combined with the fact that it failed after 23 tests combined with the fact that it needed 3 shrinks == "Fails for inputs bigger than 81." . The fact that i took 3 shrinks cannot be reliably useful as a risk or likelihood assessment because it's random. With a different combination it can return 4 or 5 at worst (based on your description of the algorithm). So there's a big difference in the confidence factor between 33% and 20%. The number of tests again, varies. At best I can read it as "how difficult was it for the Shrinker to find the boundary".
In conclusion, for me your presentation was good. Not the greatest sound but you did speak very clearly and you did explain simple enough to not have me re-watch it. Thank you for your time and please accept my like. The idea is acceptable for some use-cases but in almost every instance the unit tests are better. Usually at QA generators and builders for worst cases are used for a very long time without outputs more optimized. Almost every acceptable testing framework has one and those boundaries are found in the specifications of the feature.
The absolute only case when I find this being better than everything else is if I receive a blackbox which I don't know what and how it should do it. Then, yes, this method of discovery would be useful. Gladly I have never encountered this. Also gladly ISTQB covers all these types in depth. And for those special strings I do have a list of them that I do paste with scripts in whatever input. This tool has literally 0% chance of finding something I will not find in the first hour.
P.S. I am not a gherkin fan.
I think that he should have covered a more complex example of PBT (property based testing) but as he said, it is difficult to fit it on a slide.
Quickcheck and most production ready property testing libraries are quite sophisticated in generating and shrinking. The shrinking example he gave was more for easy explanation of what a shrinker does. Also, do not forget that a computer is running these tests, not a human. That means that millions of test cases can be generated and shrunk in a second. It does not replace human testing or completely replace unit tests. Unit tests are useful especially for testing known regressions and what the developer thinks are edge cases.
PBT is just another tool in your arsenal to be more confident in the correctness of your code. Another interesting way of testing similar to PBT is fuzzing which has uncovered a lot of bugs in many opensource libraries.
Check out this post : begriffs.com/posts/2017-01-14-design-use-quickcheck.html for more information on how quickcheck (Haskell) works.
@@espeon91 after reading your comment I think I can pinpoint why I don't trust it:
Not all inputs of a function can exist by themselves.
In case of an add function, the random integer generator is great. In case of a stock management function the integer generator is almost useless. In the later the properties are the business logic. The output of the function is tied to the input in a way that it actually matters which stock goes to which product and in which order. I can very well imagine how I can use PBT in that context and it's borderline to TDD or data driven testing. So borderline that i'd argue that any other practice except PBT brings more advantages. Having the shitty spreadsheet of data driven testing is useful for linking the testing of this function to another. Having the TDD helps with the documentation. Having unit tests stands in between the two.
@@CliseruGabriel Which is exactly why not to use just the primitive int generator. PBT works with complex data types as mentioned in the video. So you will not be testing a list of ints but a type like StcokPortfolio which will have suitable range of acceptable values and properties. Using a method in a sub-optimal way does not mean the method is useless.
@@chriswarburton4296 which if put into words like you just wrote to me gives us a test suite which doubles down as documentation. By reading them even if I have no clue about the project it totally makes sense to me (kudos btw). But put into the framework presented in the video gives "not much". And then, if we take the framework presented and add words to it is almost like taking Gherkin and adding random values to it. Which is almost like a dynamical version of data driven testing. Which is almost like having a bunch of well written unit-tests.
This is my "argument" against the presented framework. It doesn't bring much to the table because all the other approaches end up testing the properties. Even if they admit it or not. They are different representations of how "invent" a set of values for which a logic must hold true. And the presented approach solves the problem of a very bad actor while taking away from readability. We can solve the same problem of the bad actor by adding random generation to the existing approaches while keeping the documentation part and have a more human usable result.
Tests don't work. Never have, never will. They simply represent the programmer's imagination of what could possibly go wrong with the software rather than the reality of what actually will go wrong. We have half a century of experience with serious bugs that have bypassed any and all test benches. End of story.
let add(x, y) =
if x == 0:
y
else if y == 0:
x
else:
0
MAXINT + 1
Two plus two is four, minus one is three, quick maths!
Yes, but it is that in any module arithmetic greater than five, as well... so in reality you won't even pass a char vs. int vs long error check with these kindergarten games. This is simply a man talking who is so stupid that he can't even tell just how stupid he is. ;-)
So your manager decides you need to improve quality so he hires one of them friendly neighborhood consultants. So they guy of course tells you that you don't have enough unit tests. And he decides to refractor your add function that has been working for years to show you small brain plebs how a big brain would do it. After he is done the number of tests has grown 2000 times, he took two weeks to do it but he is paid by the hour so really it was a waste of his time more than anything. A mount later someone report's a bug, you investigate and you find that your add function changed from decimal to double, the tests took way too long to run, I mean really you are testing every possible combination now so it started getting slow. You ask the guy writing integration tests why didn't he write a test to test this, he tells you that the big brain told him no need to do it any more edge cases are covered by unit tests now and that he should focus on things that have real value to the user.
lul, cool story bro
Is there a point to this screed?
he could just return x to pass the three tests
That would fail commutativity
If you always return x, then add(x,y) is not equal to add(y,x) unless x and y are the same. It fails the very first test (commutativity, like the squirrel said).
The whole point of programming is to be lazy
To automate
lol