What is a good test?

Filed under: Clean code, Java, TDD, Test automation, — Tags: Bad tests, Dependency injection, Good test, JUnit, JUnit, Mockito, Parameterized JUnit, Readability, Test harness — Thomas Sundberg — 2012-03-08

A colleague asked that question the other day. What is a good test? It is a really difficult question to answer. There are a number of properties that hold true for a good test. I will list some of them and discuss why I think they are important. Others may think that the order of importance is different or that other properties are more important then those that I list.

I will show some examples using Java and JUnit of what I think are bad tests and how they can be refactored to something better.

Tests are automated in my world. I always try to avoid manually testing. The main reason for this is that manual testing is slow and hard to repeat arbitrary many times. Therefore, test should be read as automated test in for the rest of this text.

Different types of tests

Tests come in many different shapes and forms. They have different purposes. Putting them in different categories is difficult and dangerous. A lot of people have opinions on the subject so it is like stepping in a mine field. Never the less, one way to categorise different tests may be this one (as suggested by Alexandru Bolboaca):

Acceptance
Integration
System
Unit

A good acceptance test is written in the domain language and clearly communicates the intent of a feature or story. The purpose is to have a way to measure when a feature can be considered done. It is formulated and understood by a domain expert rather than a developer struggling with the domain knowledge. It is expected to have a life time that is as long as the life time for product.

An integration test is usually slow. The purpose is to verify the integration between components that makes up the system. One such component could be a database.

A system test is usually very slow and could involve operations in GUI. The purpose is to verify that things that are expected to be possible to do actually is possible to do as a normal user. It could be a Selenium script that exercises the GUI through a browser by adding things to the database and then retrieve them.

A unit test must be blazing fast, running in milliseconds. The purpose is to develop the functionality incrementally and in a testable way. It is only using the core classes of the system and everything is executed in memory. This is where the use of tools like JUnit started and normally the result of test driven development, TDD. Unit tests will only test one thing by following the Single Responsible Principle, SRP.

Dividing test into these classes may differ from your opinion. Please suggest an alternative division if you think there may be a need for it.

One common thing for all categories above is that they follow the three R's of automated tests:

Rapid
Reliable
Relevant

Why?

Before we start discussing what a good test is, let us ask ourselves why we want good tests? There are many reasons, most of them are so obvious that I have a hard time even understanding the question.

Tests act as the guarantor for the production system that guarantees that it delivers the value the customer is paying for. It is therefore important to remember that the tests are at least as important as the production code. Having a lot of tests with poor quality will get very expensive. Trusting bad tests may result in serious bugs that are not caught. Bad tests may actually be worse then bad production code.

But enough of the motivations, let's start discussing properties of good tests instead.

Common structure

All tests should follow this pattern:

Setup
Execute
Assert

All test has to setup the System Under Test, SUT. If there isn't any system, then there isn't anything to test. Some setup will always exist.

The thing that is about to be tested has to be executed. If you don't exercise the system under test, you don't test it.

The execution has to be asserted. No problems will ever be found if there isn't any assertion.

Common properties

Good tests will, at least, have these properties:

Clarity
Test one thing
Never use any conditions
Don't hide the setup
Avoid repetition
Don't catch exceptions
Limit the running time
Use dependency injection

Clarity

The most important thing with a test is that it is possible to understand what is being tested. The test is not tested by anything else so if it is difficult to understand what is being tested then it is a bad or useless test. Compare it with the analogy, who is watching the guard?

What makes a test easy to read? The answer is sadly that it depends. If you never have written any code, then everything is difficult. If you write a lot of code, then something reasonable difficult may be easy for you to read and understand.

It has been shown that most humans have problem juggling more then five things (or was it seven?) at the same time as they are working on a problem. This tells us that we have to minimise the number of things that will be taken into account during the setup, execution and the assertion of a test. One thing is the easiest case and therefore something to strive for.

Well crafted tests written in JUnit may be very good if they are read by developers. The requirements that define the system we should build may not be written by developers. There are other tools better suited for this task. Cucumber is one great tool that can be used for this.

Test one thing

A good test verifies one thing. If a test fails and there are more then one possible place that may be broken, then it is a bad test. If there are two different reasons why a test may break, then these two reasons are probably good reason to refactor one test into two tests.

Never use any conditions

A test that test exactly one thing will never have a need to contain any alternative paths of execution. This means that if statements should not be a part of a test. Tests that contains alternative paths of execution should be refactored to two or more tests.

There are developers that have a lot of problems with this. They think that in order to decrease code duplication it is better to write a test as compact as possible with different paths of execution so more than one situation can be verified in the same test. This is of course totally wrong, what they actually achieves are tests that are, unnecessarily, difficult to understand and read. Therefore they fall with the most important property of a test, Clarity.

Don't hide setup

Setting up a test means preparing the system under test so the condition you want to verify is possible to verify. This may mean creating objects, mocking some uninteresting objects and so on. If this part is hidden, it will be unnecessarily difficult to understand what the test actually verifies. It will be difficult to understand, by inspection, if this test is valid, tests what it claims to test and if it is correct. So hiding the setup is a very bad thing.

Avoid inheritance

Inheritance is sometimes a great way the collect common code in one place. All tests that inherits a specific superclass will by definition benefit from a common setup. This may sound like a good idea, but inheritance hides the setup in a great way. Inheritance may therefore may be one of the worst ideas that can be implemented.

Prefer composition

A better approach is to create utility classes if a specific setup has to be shared between tests. These utility classes has to be easy to use, do one thing and one thing only. They may contain small variations if small tweaks are necessary on the thing they setup, but this should be really small variations.

Use Before and After

It may turn out that you have similar setup and tear down for each test. In this case, use the @Before and @After mechanism of JUnit. The methods annotated with these tags will be executed before and after each test method. This is a great place to store common setup.

Use help methods

If you have a difficult setup, even if you don't share this setup with other test method, abstract them away to help methods if the details isn't important for the current test. It may be difficult to find an appropriate level when these help methods should be implemented and when they shouldn't. This is when your experience as a responsible developer kicks in, a responsible and experienced developer knows when to abstract stuff.

Avoid repetition

The use of for and while in a test is an indication that you want to do a lot of things. A test should do one thing. If you find that you have a need for repetition, ask yourself why. Is there another way to do this without repetition? Can you implement more test that verifies smaller parts? The urge for using repetition may actually be a sign of a refactoring that should take place.

Another argument for avoiding repetition is that it gets, unnecessarily, difficult to follow the code and verify that it is correct. Repetitions have boundary rules that defines how many times they should be executed. Making a mistake when defining the rules may result in a incorrect test. Should a for loop always start from 0 or should it start from 1? It depends and is therefore a risk that should be avoided if possible.

One situation where I think that this rule may be skipped is when you have to assert the content of a returned list. There may be a very simple rule that should hold true for all elements in the list and it is your job as developer to verify that this actually is true for all elements.

Suppose that the list always contains two elements, then I think that it is not a valid situation to pollute the test with repetition. It is better to have two asserts, one for the first element and one for the last element.

List where the length varies are candidates for repetition. List with more than two elements may also be candidates, but this will of course vary from time to time.

Always remember, repetition in a test should normally be avoided.

Don't catch exception

If the production code can throw an exception and this would be an error, then don't catch the exception. Have the test to throw it. Signaling in test systems like JUnit is done using exceptions so an exception will fail the test. You are able to fix the production code and run the test again.

Catching exceptions is something people are fond of. I have seen code like the one below. It has of course been re-written to protect the (not so) innocent.

@Test
public void exampleOfAStrangeTest() {
    // setup
    boolean thrown = false;
    try {
        executeSystemUnderTest();
    } catch (ApplicationSpecificException e) {
        e.printStackTrace();
        thrown = true;
    } catch (AnotherApplicationSpecificException e) {
        thrown = true;
        e.printStackTrace();
    }
    assertEquals("AnotherApplicationSpecificException is thrown, the test is broken!", false, thrown);
}

I am not entirely sure of the ideas behind this construction. But I assume that it is meant to be informative by printing the stack trace when exceptions are thrown and fail the build. But it misses a few points. It misses how signaling is done in JUnit. It also misses to assert the system, it only fails if exceptions are thrown.

The refactoring of code like this is obviously this:

@Test
public void refactoredExampleOfAStrangeTest() throws AnotherApplicationSpecificException, ApplicationSpecificException {
    // setup
    executeSystemUnderTest();
}

Exceptions are used for signaling in JUnit. There is therefore seldom any reason why out should catch an exception.

Another thing that could be commented on is the obvious missing assertions. Something has been executed and no exception was thrown. This is a good start, but where do we verify that the system actually has done anything? The missing assertion is not easily spotted in the first example, in the last example it is trivial to see that it is missing. The original code was larger, had more setup but missed asserting anything.

What about code that is expected to throw an exception then?

Use the built in ability of JUnit to catch expected exceptions like this:

@Test(expected = Exception.class)
public void executeSomethingThatIsExpectedToThrowAnException() {
    // SUT
}

It will fail if the expected exception isn't thrown.

There are situations when using the expected construction is not the best idea. Suppose that you expect an exception to be thrown and that some cleaning of the production system should occur in this case. Then you may want to assert that the cleaning is done properly. In those cases, catch the exception and perform the asserts in the catch clause. Make sure that you add a fail(); statement just after the statement that is expected to throw the exception. Add an argument to the fail("The system under test should have been broken here due to some reason"); so anybody executing the test can understand what happened when the test broke.

@Test
public void executeSomethingThatIsExpectedToThrowAnExceptionOldStyle() {
    try {
        executeSUT();
        fail("An ApplicationSpecificException was expected");
    } catch (ApplicationSpecificException e) {
        // perform assertions that expected cleaning was done properly
    }
}

Limit the running time

JUnit has a built in capability to limit the time a test is allowed to be executed. This is a great way to fail a test if it loops in infinity. It is also a great way to catch really slow tests that either need refactoring or is an indication of a performance problem with the production code.

@Test(timeout = 100)
public void limitTheRunningTime() throws ApplicationSpecificException {
    executeSUT();
}

The running time limit is set in milliseconds, the example above will fail if the running time is longer than 100 ms.

Limiting the running time is dangerous. It may produce false failures due to slow hardware. It has to be used with great care and the allowed time for running a test should be set reasonably generously. When a test fails due to a limited running time, make sure that you do a proper analysis of the reason why it failed so you really understand what caused the problem before increasing the limited running time.

It is a very important property of a unit test that it is fast so limiting the running time should be a very unusual action.

Use dependency injection

A class under test very seldom lives in solitude. It has other classes that it interacts with. This mean that there is a valid risk that you have to setup the entire world that this class lives in before you can test it properly. A technique that may be used to avoid creating concrete instances of all collaborators is to create mocks for them and use the mock during the tests. A problem may be that the mocks are hard to get into the class that need them. This is where the idea of dependency injection comes in.

Dependency injection is a very simple technique. You need a mechanism to set dependencies in a class. This is usually done either through the constructor or using methods that only sets instance variables in the class under test. Use this technique to set the mocks and go on with the real work, making the class behave as required of the system.

Constructors that use new are the work of a devil. It might also be the work of somebody that doesn't want the code to be possible to test. You don't have any control over the things that are created. They may create anything, including objects that forces you to execute the system in a particular environment connected to a specific network. They may update something so the result from the execution varies from time to time. Construction new objects in a constructor instead of supplying them in some way is just plain stupid and awful.

Make sure that all dependencies an object need are possible to supply to the class under test easily. Always make sure that you allow other to inject collaborators into a system under test. If you do this, then it is possible to cut dependency chains that otherwise may be very long. So long that the setup of the system under test is hard, or even almost impossible, to do.

Mocking

Mocks can be done manually or a mock framework can be used.

A null implementation of an interface may be a great way to create a mock manually. It will not be dangerous to use and it will not do anything.

If you want, you can often use a mocking framework. One of my favorite mocking framework is Mockito. It behaves nicely as long as you don't try to do something too exotic. If you do, then you either have a very peculiar problem or you are doing something wrong. Peculiar problems do exist, but they are so rare that you have to ask yourself what you are doing wrong instead. You are probably doing something wrong.

Test harness

Mocking objects will only take you so far. Mock objects are actually useless if you don't have access to the source code. Something different is needed in this situation. You want to create a test harness that will act as a the resource you need. It will behave deterministic and is therefore similar to a mock, but a large difference is that you can use it without access to the source code.

Using test mock objects or test harness both act as fakes that you have full control over. They solve similar problems, but in different situations.

Failures

A failure during development is a a good thing. You have found a bug and the test prevented you from delivering it to a user.

When a good test fails, the only thing that will make it pass again are changes in the production code. If a change in the test is needed, then it is not a good test.

If a test is an acceptance test and it fails due to an internal change in the production code, then it is a bad test that has to much knowledge about the internals of the system it tests.

If it is a unit test, then the only changes needed to make it pass again should be changes in the class it tests. If changes are needed in other classes, then it is a bad unit test. It either tests too much or is setup badly. Testing too much indicates that the test should be divided into two, or more, smaller tests. A bad setup implies that dependencies needed for the class under test isn't properly setup.

Four rules of simple design

Good test always follow the four rules of simple design that Kent Beck formulated in his XP books. I wrote about it a while ago in Four rules of simple design.

Following these rules strictly will guide you to good tests. Practicing, reading other peoples tests and contemplating will drive you in the direction of better tests.

Acknowledgements

This post has been reviewed by some people who I wish to thank for their help

Malin Ekholm
Johan Helmfrid
Alexandru Bolboaca
Johan Karlsson

Thank you very much for your feedback!

Resources

JUnit - a test framework
Mockito - a great mocking framework
Piotr Jagielski held a presentation at Agile Central Europe 2010 about refactoring tests.
The three R of test automation
Unit Tests Are FIRST - Fast, Isolated, Repeatable, Self-Verifying, and Timely
Thomas Sundberg - the author

(less...)