January 14, 2021

Purposeful Testing or Why the Test Pyramid is a Scam

The test pyramid is a metaphor that illustrates the “ideal” weight of tests depending on their granularity. It goes from the unit tests at the base - which are more numerous - to the end-to-end tests at the top - which should be scarce.

It is often opposed to the “ice cream cone” anti-pattern that features a few unit tests and a massive amount of manual end-to-end tests.

I think that this pattern is a good heuristic - especially when your test suite looks like an ice cream cone - but it also yields a lot of sterile discussions, such as

What is a unit test? What is an integration test?
Do I test private methods?
Do I test pass-through methods? Do I test getters, setters?
What is ideal code coverage? Should I go to jail if I have less than 100% coverage?

Purposeful Testing

Behind this click-bait title, I would like to go beyond the test pyramid and start reasoning in terms of purpose rather than form. Whenever I write tests, I make sure to ask myself the following questions, to be sure that my tests are fit-for-purpose:

What is the purpose of this test? Knowing what the test is supposed to check prevents me from trying to test too much and ultimately write something that will hurt me in the long run. Let’s remember that tests are similar to production code in the way that they require maintenance.
What would make this test fail? By trying to answer this question, I often realise that this test’s failure does not mean that it has achieved its purpose. Worse, sometimes the test simply cannot fail. Tests are not supposed to check everything, they should keep passing when something that they are not supposed to cover is broken.
Will it impede our ability to refactor? A test can too heavily be coupled with the production code can actually be counterproductive. Indeed, it would incentivise not to refactor and hurt maintainability. A typical symptom of a test that impedes refactoring is the overuse of mocks.

Test Taxonomy

By applying this reasoning, I identified the following types of test (this list is far from complete and will get updated in the future).

Support Tests

These tests aim to support the development of an algorithm or standalone class (i.e. no dependency). You would typically start with a very straightforward code and incrementally add test cases to increase the scope and add new edge cases. I have found this type of test to be the perfect use case for Test Driven Development.

Plumbing Tests

Plumbing tests ensure that several classes, external libraries, or third-party services work together as intended. In the test pyramid, these tests fall into the generally accepted definition of integration tests. The goal here is not to check all the edge cases but rather the assumptions made about the various APIs.

Use Case Tests

These end-to-end tests make sure that a user scenario works as intended. Use case tests typically include roundtrips to the database and virtually no mocks. They are good candidates for double-loop TDD.

Orchestration Tests

Unless use case tests, orchestration tests rely heavily on mocks. They are relevant for pieces of code that will just enforce a workflow and call other dependencies. For instance, you may want to check that the code triggers a method call when some conditions are met. I usually make sure to keep these tests are extremely lean as too many mocks can seriously impede refactoring.

Developer Experience (DX) Tests

As I always say to everyone willing to listen to me, tests do not only check your code’s behaviour, but also its design and usability. DX tests are extremely valuable when you are creating an API and want to verify that its usage makes sense and is straightforward. These tests should treat the API as a black box and make absolutely no assumptions about its internal behaviour.

Crutch Tests

The crutch tests are particularly useful when you want to tackle messy legacy code and refactor it. The objective is to extract a chunk of code, test it from the outside and rewrite it entirely without touching the tests at all. We can readily notice that these tests will typically include roundtrips to the database. The main challenge of these tests is to find seams to separate the code you want to rewrite from the rest. These seams are a good place to put mocks in place and inject data in the system under test. Once the refactoring is over, some of these tests might become redundant and can be removed.

Confidence Tests

We often read that we should not test third-party libraries because they are already massively tested and it’s not our place to do it. While this may be true, I think it can be useful to test our assumptions about the libraries we use. The confidence tests make sure you understand how an external library works an ensure that it won’t change its behaviour under you (e.g. during an upgrade). By the way, these tests can also apply to your own legacy code!

Documentation Tests

We should always thrive to write the simplest code possible. However, it is sometimes not possible and we have to slip some hacky hacks into the codebase (e.g. optimisations, libraries or language shortcomings, etc.). In this case, I like to write documentation tests that will explain the code’s behaviour so that other developers (or me, in a week) can understand what it does.

Scratch Tests

Scratch tests are short-lived tests that I sometimes use to check an assumption I have about the code. I use these tests to ask the code questions and see what is its answer. I typically delete them as soon as my assumption is verified or discarded.

Takeaways

For years, I struggled with the granularity of my tests because I had the feeling to write some tests for the sake of it. Also, I was refraining myself from writing other tests that would have been useful, because they did not fit in any of the boxes that the test pyramid provides.

When I started to step aside this test pyramid, I realised that I did not care that much about the number of tests or the balance between unit, integration and end-to-end tests. I just write them when I feel that they could bring something to the table.

Also, I noticed that I am using way less mocks than I used to because I don’t force myself to write “pure” unit tests anymore. As a consequence, I feel that tests enable refactoring instead of impeding it.