Creating confidence when testing

Filed under: TDD, Test automation, — Tags: BDD, JUnit, Java, Mockito, Testing, Unit tests — Thomas Sundberg — 2016-10-03

To gain confidence when testing software, you want to test the program as much as possible. The conventional way to do this is to test the application extensively through its external endpoints. These external endpoints may be a user interface or web services. They can almost always be be automated and automation is a good start.

Unfortunately, testing from the external endpoints leads to a few problems:

The diagnosis when something fails is unclear - there are too many possible sources of errors
The feedback to the developers is slow - the execution time is too long
It is hard to have confidence in the tests - it's impossible to test all combinations of user input through the system

The cure is to rely as much as you can on fast unit tests. But a unit test will only test one thing. To know if a class can collaborate with other classes, you need to test that collaboration scenario. This can lead to integrated tests that have bad diagnosis precision, are slow, and have too many execution paths.

There is one alternative, though, that many developers hasn't explored enough. That is using unit tests with mocks and stubs in a strict way. I will explore this alternative in this post.

The unit testing paradox

A codebase with many unit tests, perhaps covering 100% of all production code, can still contain several errors. This may seem strange. We are testing every detail of the production code and still find bugs.

What is the problem? Do we have to fall back to large, end-to-end tests, to be able to solve this issue?

The problem is that even when we have used a mock to verify interaction and stubs to stub the collaborators, we may have made mistakes implementing the collaborators. We implemented the stubs to behave in a certain way, to return specific values for specific parameters, but we failed to implement the collaborators to have the exact same behaviour.

Now that the problem is known, what is the solution?

The solution is to implement the collaborators and re-use the behaviour we specified when we used the stubbed collaborators. If a stub returned 42 from a method when the method got 17 as parameter, this is the very same behaviour we want the actual implementation of the collaborator to have. This means that we need to re-use the values from when we stubbed a collaborator when we implement the tests for the actual implementation of a collaborator.

This may sound a bit abstract. A clarifying example is probably a good idea.

An example

Let us assume an example where we want calculate the correct VAT date for a company in Sweden that has sent invoices within EU. The date for paying VAT for 2015 if you had sent invoices within the EU was 2016-02-28. That is, the 28'th of February 2016.

I am interested in three things:

A VatRules engine should be used from a Business class
A stub of the VatRules engine should return 2016-02-28 for the organization number 5569215576
The actual implementation of the VatRules engine should return 2016-02-28 for the organization number 5569215576

That is

Verify that a collaborator is used
Stub the collaborator
Implement the actual collaborator and make sure that it behaves exactly as the stub

First step: Make sure the collaborator is used

My first test looks like this:

@Test
public void check_that_the_vat_rule_engine_is_used() {
    VatRules vatRules = mock(VatRules.class);
    Business business = new Business("any string", vatRules);

    business.getVatDueDate();

    verify(vatRules).getVatDueDate(anyString());
}

I create a mock of a collaborator, the VatRules. This will allow me verify that the collaborator is called exactly one time. The verification is done in the last statement verify(vatRules).getVatDueDate(anyString());. I don't care which organisation number the method has been called with. I only care about the method being called. This allows me to use anyString() as argument.

If this test passes, then I know that the collaborator, VatRules, is used properly in the business object.

Second step: Stub the collaborator

My next step is to stub the collaborator. The VatRules can be a cumbersome implementation that might require a look up in something slow, say a database or a webservice somewhere. How the implementation does the lookup is currently uninteresting. What is interesting is that we are able to get an expected response given a specific input. In this implementation, I expect to get the value 2016-02-28 for the input 5569215576. This is very specific and a valid case.

@Test
public void check_due_date_for_vat_when_you_invoice_within_eu() {
    LocalDate expected = LocalDate.parse("2016-02-28");

    VatRules vatRules = mock(VatRules.class);
    when(vatRules.getVatDueDate("5569215576")).thenReturn(LocalDate.parse("2016-02-28"));
    Business business = new Business("5569215576", vatRules);

    LocalDate actual = business.getVatDueDate();

    assertThat(actual, is(expected));
}

The actual creation of the stub is done with the statement when(vatRules.getVatDueDate("5569215576")).thenReturn(LocalDate.parse("2016-02-28"));. This is the way it is done in Mockito and it can be argued that it is a bit unfortunate that stub creation is done implicit when you force a mock to return a specific value for a specific input. An alternative could be to hand roll the stub and implement the same interface as the VatRules implements. I didn't do that in this case.

If this test passes, I know that the application should work for this specific example. If the actual implementation of the VatRules behaves exactly as the stubbed version. My next step is therefore to implement a test for a concrete implementation of VatRules that uses the same value as I used for the stub.

Third step: Implement the collaborator

@Test
public void vat_due_date_is_28_feb_if_invoicing_in_eu() {
    LocalDate expected = LocalDate.parse("2016-02-28");

    VatRules vatRules = new VatRules();

    LocalDate actual = vatRules.getVatDueDate("5569215576");

    assertThat(actual, is(expected));
}

This test uses the exact same values as I used above when I stubbed a VatRules. This very important and the core of this technique.

Working from the outside in and ignoring to verify that the collaborators behave exactly the same way as the stubbed version will lead you down a path of random errors even though you have 100% test coverage from your unit tests.

Is this enough?

I am now in the situation that I know three things:

The VatRules collaborator is used and called exactly one time.
When the VatRules return a specific value, the system under test acts as I want it to behave.
When the actual implementation of VatRules get the same parameter as the stub got, it will return the same value as the stub returned.

I have confidence that the my application have some basic correctness. The three issues I wanted to handle at the top are fixed.

The diagnosis when something fails is clear - there is only one reason for a unit test to fail
The feedback to the developers is fast - the execution time for the unit tests is short
I have confidence in the tests - all interesting paths through the system are verified

Is this enough for me to release this piece of software? To be honest, no. I can verify the basic correctness of the application using this technique with mocks and stubs. I do not, however, know if all pieces has been properly connected. To be able to verify that all classes are properly connected, I want to run the application and verify a few scenarios end-to-end from the external endpoints. It is enough to verify a few scenarios as long as all the classes that have to be wired are used.

Outside in

Verifying that all interesting classes are executed is hard if you try to do it last. An easier solution is to start with an acceptance test and then implement the pieces using unit tests. This is a large subject and books have been written to cover it. One of the better books is Growing Object-Oriented Software Guided by Tests by Steve Freeman and Nat Pryce.

This is also the workflow that Behavior-driven development, BDD tend to lead you in. If you are interested in learning more about BDD, please contact me.

Possible refactorings

The steps I outlined above can be used to drives the process. When it's done and you are happy that it works, it's time for refactoring. There are at least two refactorings to consider regarding this example.

Should all tests be kept?
Are there any duplication we want to get rid of?

Keeping all tests?

The first test, verifying the interaction, can be questioned in some situations. In this particular case, I am using a return value from a stub. Does this mean that I need the first test that verifies the interaction?

The answer is of course that it depends. There are situations where it is very important to know the number of interactions and make sure that any unnecessary interactions are prevented. In this case? Maybe not.

Sometimes it is ok to use the first test and force an interaction and later remove the test that forced you in that direction. The second test may cover the usage of the collaborator and therefore be enough.

What about duplication?

I am duplicating some values in the example above. Is that ok or is it a problem? I often don't see this as a big problem, the values are localized in one class and in two methods close to each other. It can, however, be argued that when using the same values in step two and step three, it would be a good idea to keep some repository with them somewhere. They could be defined as constants that are re-used.

If you feel a need to setup a repository with values to re-use, go ahead and do so. I might not. But it all depends on the situation. If the distance between the two tests is large, then maybe I would. But if you have a large distance between the steps, then you might have another problem making sure that they actually balance. And balancing the tests are much more important than worrying about duplication.

Integrated tests

It may not be clear for everyone what the difference is between integrated and integration test. The spelling is similar, but the meaning is very different.

I use the same definition as J. B. Rainsberger does in Clearing Up the Integrated Tests Scam. That is, an integrated test is used to test many of our own classes at the same time. They do not test interaction with external systems.

Integration tests on the other hand will test the interaction with external systems. This may mean the interaction with a database or a message que. Anything that is not under your control. You need these tests, but they are not what I mean when I say integrated tests.

Conclusion

Basic correctness of an application can be verified using unit tests as showed above.

To be able to verify that the wiring of the application is done properly, I run a test that fires up the entire application and uses it end-to-end. This may be a slow test, but it is only a few scenarios and it can't be skipped. It is something we will have to live with.

Acknowledgements

I would like to thank Malin Ekholm, Peter 'Code Cop' Kofler, Adrian Bolboaca, and Aki Salmi for proof reading. It is great to get feedback that forces me to question my thoughts.
Thank you!

Resources

(less...)