Should We Talk About the Colour of the Bike Shed?

Posted by Stephen Cook

Peer-review is a tool used to have other software developers give your code a once-over; to bring in a fresh pair of eyes to look at the problem, before dedicating a solution to the code-base. Whether or not peer-review is useful isn’t something I’m going to discuss here, and will simply take as a given that it is (see this by Jeff Atwood for a quick summary on my stance).

This article, instead, is going to discuss when peer-review goes wrong, and instead of issues like “this code won’t work when load is high” being revealed — time is wasted arguing about whether or not a particular line of JavaScript should end with a semicolon.

Forewarning #

This post contains quite a bit of me discussing the issue without ever displaying any solid data, or proposing solid solutions.

My main intention here is to make you take a step back and consider where you stand on the blurred line between being so lax that you allow code that no-one understands to be merged, and being so tight that you damage project momentum because of some minutia.

I know that I fall on the latter side of that line more often than not, and this post is mostly my trying to determine where the middle of the blurred line lies.

Enter Bike Shedding #

Bike-shedding (also known as Parkinson’s Law of Triviality) is a social phenomenon, that can easily sidetrack peer-reviews, and negate their benefits considerably. People want to contribute, but will only discuss issues they feel comfortable talking about. This leads to an interesting phenomenon where important, but complicated, issues will go without much discussion at all — but comparably trivial issues will get a lot of attention, since everyone is comfortable discussing it.

This occurs frequently in software engineering, where (for example) basic syntactic analysis is a lot easier to perform than structural analysis, and functional accuracy. Jeph Atwood wrote a post where he explains how it’s also quite fun to just talk in abstract terms, procrastinating the need to actually do something useful.

We want to avoid bike-shedding at all costs in our peer-reviews; but does that mean we shouldn’t ever comment on syntactic issues? Should we make a rule that only relatively more important things are even discussed in these reviews, so syntax is off-limits?

How Bad is Bad? #

Picture the scene: it’s 3am, you’ve just been woken up with a phone-call telling you that your servers are down. There’s a problem with some code, and the servers can’t reboot until you fix it.

You’re losing international traffic with every second that you don’t fix the issue. You track the issue down, but you need to decipher what one particular function does before you can fix it. This is what you see:

Now imagine the same scenario, but you find this code instead:

These 2 examples are semantically equivalent. They will (or, at least, they could — depending on your compiler) compile down to the same byte-code. But I bet there’s one of the programs that you would rather be looking at when the clock is ticking.

Obviously this is an exaggerated example, the first example is too terrible for anyone to truly consider writing in practice (touch wood). However, my point is that we know we shouldn’t ignore syntax entirely in our reviews. This example hopefully proves that readability of code is a concern, and that our graph looks like this:

2 points on a graph showing unreadable is unacceptable, and readable is acceptable

The only question is what the points in-between look like. If we want to be very strict on syntactic prettiness, then it would look like this:

Example curve, not progressing acceptability until around 70% readability

Or if we want our code to not be unreadable, but quickly reach a point of “good enough”, then our graph would look instead like this:

Example curve showing acceptability not increase until around 20%, but then increase of readability adds little more acceptability

How Strict to Be #

In general, syntactic points of contention are things like:

  • Class/member/function names
  • Test case descriptions
  • Function granularity
  • Block depth
  • Line length

It would be great to see some data on how important each of these things actually is (e.g. some data showing the average speed for a developer to find a bug in code with varying degrees of these points), however such data is hard to come by. (If anyone knows of any data or research around this, please let me know!)

Even if such data were available, conclusions drawn from it wouldn’t be as helpful as you might think. If statistics told us that we would experience a 10% reduction in bug-finding speed for function names at 70% in some quality metric — we couldn’t turn this into practical advice very easily.

If x is roughly 65% “good”, is that okay? That’s roughly a 15% reduction in bug-finding speed for this portion of code. How often will people be looking at bugs in this code? How long will it take to come to an acceptable variable name?

Wherever you decide to set your threshold on how strict you want to be, it’s worth keeping in mind broken window theory. It’s never just one function that gets syntactically damaged — once one part of the code gets left in disrepair, people struggle to feel the need to keep other parts in good repair.

Time Pressures #

Another variable is how important it is that the code be finished quickly. Unfortunately, we don’t write code in a bubble, and the product needs to be shipped.

Even if you don’t think your project really has a deadline, it probably does. Even without an explicit deadline, a product will always be implicitly affected by time. Mark Suster wrote an interesting post about time constraints in a project, and their importance.

Design with no constraints becomes a research project.

— Mark Suster

People in burning buildings would jump several floors into firemen’s life nets (or at least, up until the life net became obsolete in the 1980s, as I found out writing this analogy). Similarly, if your business will lose half of the market by not shipping tomorrow, then you shouldn’t refuse to allow someone to merge some code because you barely understand it. Just like the person jumping from the building — under normal circumstances it would be a terrible idea, and the idea of it may make your stomach turn, but sometimes bad choices are the best ones, given context.

This isn’t to say you should entirely do away bothering with peer-review at this point, just that the bar should be appropriately lowered. In the burning building analogy, I’m not suggesting you jump out of the window head first — simply that you shouldn’t ignore the danger of the fire just because you’re more familiar with the danger of falling. Being as cautious as you possibly can, make sure you have good footing, close your eyes, and make the jump.

However, the industry we’re in will always create time-pressures. The new product always needs to be released, the new features always needs to have been shipped yesterday. Constantly merging shoddy code will catch up with you when you need to support it as legacy code.

Choosing between merging in everything willy-nilly, and blindly maximising the x-axis of the previous graphs isn’t enough. We need to add another variable to get the whole picture — the need to release.

Graph showing a range where code quality is acceptable, and deadline is still met

This graph is largely fictional, I guessed what a realistic curve might look like for trying to improve code readability over time. However, my point is that a point of “good enough” should be decided upon, consciously on some level. This should be countered with your notion of when a release is needed by.

If you ever notice that, at the speed you’re able to work at, and the quality of code that you’re aiming for, your project looks closer to this:

Graph showing deadline and code quality bounds being too strict, and not intersecting with what's possible

Then you have a problem. Unless your project is restructured, or re-scoped in some other way; your project (for some chosen ratio 0 ≤ r ≤ 1) is going to be roughly over the deadline, and (1-r)β under the acceptable code quality bound. You will need to choose your ratio r for where you want your project to fall.

This is hard to do when the person setting the deadline and the person setting the code quality aren’t the same person. But like I said earlier, I’m only trying to make you think about the issue in a larger context, rather than propose any solid solutions.

Of course, this sort of deadline management doesn’t just apply to code readability — it applies to features, product quality, and a large variety of other things. I’m simply formalising the notion to illustrate that this balancing act is something that needs to exist in all aspects of a project. How willing you are to lower the threshold rather than the extend the deadline, should depend on how important you think that particular aspect is, in the long run.

Balancing Acts #

Before anyone notices that this post got a bit abstract, I’ll bring it back to bike-shedding.

Being aware that there are variables that need to be constantly balanced can be daunting, but is helpful to keep in mind. We don’t want to spend so long perfecting our code, that by the time it’s finally ready, the product is out-dated. In interest of this, we don’t want to spend longer than we need to in code review. However, we also don’t want to see that there’s a crunch to meet a deadline, and assume that the code quality is no longer a concern — rather than simply lowing the threshold to meet a slightly lower bar.

To wrap up, if we want to sell bike sheds, we don’t want to disregard their colour; this is an aspect of the product that will affect the sheds’ sales. However, we should always be conscious of how important the colour is on the grand-scheme of things. Discussing the colour is easier, so people will naturally try and talk for the longest on it. But if you notice that more time is being spent discussing their colour, rather than (relatively more important) topics like the materials used, or how all the pieces will go together — then it’s your responsibility to ensure that the discussion’s priorities are kept straight. Always remember that the code is just the means to an end, not the end itself. The bike-shed’s colour is relatively unimportant.

Everything in proportion — but just be sure it’s in the correct proportion.

Further Reading #

Comments are closed.