The Bus Factor and Deploying to Production

In software development we often talk about the Bus factor. The higher the bus factor, the higher the amount of damage that would be caused if a particular developer was hit by a bus. Not damage to the person - presumably there'd be a lot - but damage to the project they're working on.

The bus factor represents the risk of concentrated knowledge

The problem is the knowledge that's concentrated in one person's brain that would be irretrievable if they were to suddenly disappear. Unless that knowledge is no longer required, there's a big risk here. When do we usually consider it "no longer required"? When it's been turned into working code.

Consider a developer who is halfway through a big feature. Because they've been deep in that feature for a while, they're likely to be across all the complexities, moving parts, and the considerations that need to be taken into account. But once the feature has been written, tested, deployed and proven, the knowledge loses value rapidly.

We can think about this in lean manufacturing terms:
Knowledge gets turned into code, and until the code is complete, it's work in progress. In lean, work in progress represents risk - just like this concentrated knowledge.

Work in progress represents risk

As an industry, we try to mitigate the risk of concentrated knowledge by sharing it between team members in various ways. Classically this involves documentation. There's usually some formal process to force developers to update documentation when they make a change (because documentation is super boring). However studies have shown that software documentation is rarely kept up to date anyway, so the problem persists.

More "agile" teams try to share knowledge by making sure more than one developer is involved at all times. Pair programming and code reviews are examples of techniques used here. In addition to sharing knowledge, studies have shown the code quality can be better in some circumstances. Particularly with complex tasks and junior or intermediate developers.

While the documentation tries to capture this knowledge so it's accessible long down the track, this has questionable value. A) It's probably out of date, and B) when was the last time you referred to old documentation for a project?

Still, it's a start. We're trying to mitigate the risk of concentrated knowledge.

Documentation tries to capture concentrated knowledge

What about the code?

So great, we have techniques for lowering the bus factor by sharing knowledge. Once our feature is complete and all that knowledge has been embedded into code, we're out of the woods, right?

Well, no.

How much use is our code unless it's being used?

I'd argue (at least from the customer or end-user's point of view) code that hasn't made it to production is effectively 0% done. In fact I have argued that. Several times.

Code not in production is 0% done

And of course, code we haven't deployed carries its own risk. In effect, un-deployed code has its own bus factor. This is true for a couple of reasons:

First (and least likely) something could happen to all that code before you manage to deploy it. Servers go down, and hard drives fail. As developers, we try to push code to the server(s) frequently and avoid leaving changes on our machine.

Second, remember I said knowledge rapidly loses value once the code is complete? Well it doesn't immediately lose value. You may think a feature is done, but until it's being used in anger in the live environment, you can't be sure!

I've had many experiences where a bug has arisen months after I finished that work. Even when I was the original author, the knowledge of how everything fitted together has faded. Rapidly. I have to relearn how it all works - which is the same problem the team would have had if I'd been hit by a bus.

Bugs can often present themselves only after code is in production

If code makes it into production very quickly after it's been developed, it's still in the developer's mind. Any bugs that arise are likely to be fixed much more quickly.

So what's the solution?

It's pretty clear to me:

Get features finished, and put them in production.

Anything else just prolongs work in progress and increases the bus factor.

Damian Brady

I'm an Australian developer, speaker, and author specialising in DevOps, MLOps, developer process, and software architecture. I love Azure DevOps, GitHub Actions, and reducing process waste.

--