Code Coverage: How Much Is Enough?

Posted by Stephen Cook

Code coverage is often used in practice as a bar-raising metric. Hard thresholds are often set, enforcing that a certain percentage of code coverage must be achieved.

Here I will go over why this is a problem, why it exists, and what alternatives there are.

What is Code Coverage? #

I’m assuming if you’re reading this you have a fair understanding of what code coverage is. If not, then you can read the following pages:

The Problem With the Question “How Much?” #

Asking “how much code-coverage should a project have?” is like asking “how many lights should a room have?”. How many lights a room should have depends on many things. How bright are the lights? How big is the room?

But let’s not just speak in metaphor. Why might a particular application require more code coverage than another? And not just in an example where one project is more important to get right (e.g. software for the military). But in the case where two projects are just as important, just one is less testable.

The question ultimately comes down to how much the code can be unit tested (to a useful degree). An easy example of this is code that handles UI, e.g. the following code:

How would we go about testing this method? We would need to test that after the method was called, all of the elements are in the centre of this. However, the only real way to test this as a unit test is to check that el.x == this.getWidth() / 2 - el.getWidth() / 2.

You can argue that this sort of unit test isn’t entirely pointless, but there’s not much denying that it isn’t really testing that our program is correct. The more likely problems are going to occur with a misunderstanding of how centring is achieved. For example, if the element comprising this isn’t centred in the screen that the user sees. Or if there was some misunderstanding, and it turns out the origin of el is actually its centre, not along its left-most point, meaning the code should be el.x = this.getWidth() / 2.

Code like this isn’t greatly improved with unit tests. Only white-box testing (whether manual, or automated in some guided way) will help with code like this.

You could argue that code coverage should take into account the work done by white-box testing etc., however this is only a minor improvement. Another simple example of code that tests won’t help is DTOs. Testing that setX and getX correctly set and get X is an unnecessary, and unhelpful use of anyone’s time.

The Problem With the Answer “X%” #

I expect a high level of coverage. Sometimes managers require one. There’s a subtle difference.

-Brian Marick

Martin Fowler has written a very insightful post on code coverage, where he emphasises the importance of using code coverage to find areas of untested code, as opposed to using it as a deciding metric to meet some quality target.

Fowler discusses how a problem with enforcing a certain level of code coverage is that people will strive to meet the coverage, without striving to meet the standard of insightful unit tests. Code coverage has no means of determining the difference between tests that are useful, or tests that are called for no reason other than to get a higher coverage rating. Consider, for example, the case of assertion-free testing, where “tests” call all methods in a class but never once actually assert anything about its functionality.

Another problem can show itself when working on larger teams. Consider the hypothetical example where you’re a developer working on a large package, that enforces 70% code coverage. At the start of the week, the code coverage is 80%. As the week progresses, other developers commit large blocks of code (with no unit tests), and the coverage slowly decreases. Because the coverage was previously 80%, these developers can quite happily commit code with no unit tests, as the coverage gets closer and closer to 70%, but never below it.

Finally at the end of the week, you go to commit your code. You’re trying to commit a DTO. But as you run the coverage checker, you see that your DTO decreases coverage from 70% to 69.9%. This now leaves you 3 options:

  1. Decrease the coverage threshold
  2. Write unit tests for code you did not write
  3. Write unit tests for your DTO

The first option largely defeats the point of having a threshold at all, if it can simply be lowered. The second option will cost your team development time, as it will take you longer to write these tests than the person who wrote the code in the first place. The third option, as previously discussed, is largely a waste of time.

What is the Average Code Coverage? #

Even with just expecting good coverage, and not enforcing it, there’s still the question of how much should we expect? A good metric for this might be to look at the average code coverage in industry.

Understandably, however, statistics like these are very hard to come by – companies aren’t going to be eager to reveal their code coverage in any fashion, unless it’s 100% (and any company that enforces 100% code coverage is not going to get anywhere very fast).

As such, I have looked at popular open source projects, and run code coverage analysis tools myself on these. Unfortunately, since this task quite quickly becomes tedious (and I’m not sure I know a good way to automate such a process), the sample-size is quite small (if anyone knows of a larger sample existing elsewhere, please let me know!). I also included data from Google who did publish a high-level description of some of their projects’ coverage here. That said, it gives at least a rough idea of how much code coverage might exist in industry.

(You can see my raw collected data here).

The median of this data is 76.9%. However, you can also see a large amount of deviation around this average, which supports the assertion that different projects will require (or even reasonably allow) different amounts of coverage.

Why is Code Coverage Enforced? #

On the flip side, we can see several reasons why people would depend on specific thresholds for code coverage.

Spotting a lack of coverage in a code review can be hard, or at least there isn’t good enough tooling for it in practice. When working with a large team it’s unlikely you could take part in all code-reviews, so if you’re concerned that people aren’t going to be rigorous enough with their testing – then enforcing high-testing in the package as a whole is a reasonable decision to come to.

In theory, some code might be committed with poor testing, but in the grand-scheme of things, the testing has to be reasonably good, otherwise the coverage checks would not pass. Seeing that the code-coverage is at X% can be comforting, and is a quick and easy statistic to look at to see how robust a particular code package is.

However, as discussed previously, this theory does not hold. Enforcing a certain level of coverage will, in all likelihood, not improve the robustness of a package at all. If anything, the package can be made less robust by the bloating of test code, and the modification of code to make it more easily testable (e.g. making private methods public, just so it can be called in a unit test directly).

Alternatives #

If enforcing X% coverage doesn’t encourage (or rather, if it discourages) good coverage testing, then how can we instead?

Improving the code review stage of the process is an important step in improving code robustness. Making coverage checks be a standard check in a code review could be useful in achieving this goal.

Including coverage reports in a code review, or using tools like diff-cover could help in quickly noticing when coverage is lacking.

Of course this all is still prone to developers simply writing unhelpful tests for no reason other than to create a high coverage rating. This again requires vigilance in code review; useless tests should be called out as such.

This sounds a lot worse than simply enforcing a coverage threshold, but it’s not. Rather, it’s admitting that the problem is a hard one to solve, and requires a lot of work. Doing otherwise is simply pretending you have a silver bullet, when you don’t.

Conclusion #

Code coverage is a very useful tool, but only for highlighting when code might require more testing. Using it to enforce a threshold of coverage is not a good idea, given that it encourages developers to waste time writing tests for the sake of writing tests, rather than writing useful tests.

Coverage tooling has come a long way, and still has a long way to go yet. But what’s clear right now, is that there is no easy way to determine how well tested a package is. We shouldn’t pretend otherwise.

Learn to be dubious of boasts of high code coverage.

Comments are closed.