Recently, I came across a thought-provoking sketch from sketchplanations.
Quoting from the sketch:
The risk you're taking if key people were hit by a bus.
The sketch is funny. It is.
I kid you not, this is also one of the darkest sketches I have ever seen. Let me give you an example of how dark this sketch can be in real life.
I have been in the IT industry for six years and have heard bizarre stories from the IT world. It could be from the news articles, from my friends, colleagues or witnessed some of them myself.
One thing that I can recall right away that's linked to the Bus Factor problem was the 'I was at my friend's home with mobile phone switch-off problem'.
I was at my friend's place with my mobile phone switched off problem
So here's the incident in highly simplified terms:
- Engineers did a hardware upgrade for a very small section of servers in the warehouse.
- They monitored the upgraded hardware for a few days and discussed it with service owners.
- After receiving good feedback from service owners, they slowly upgraded all the hardware.
- Everything worked for a week, and then suddenly, at once, services stopped working. No one knew anything, and only engineers who had access to the warehouse could identify what was wrong; the only problem was that the warehouse's key was with an engineer who was unreachable due to personal commitments.
- Once the engineer who had the key returned to the office, engineers entered the warehouse and identified the issue. Fix didn't take major time but accessing the warehouse took a lot of time. There was a great business loss to the company.
- After this incident, what would have happened to the engineer with the key? We never know!
After reading this, you might think this is a lame scenario. As lame as it sounds, this has happened. Maybe it's not the key to the warehouse, it might be a pin to enter a warehouse section or a password to access a server.
This is a major risk and fixing this risk is very important for keeping the business running.
Not just company-wide, identifying and fixing the Bus Factor problem is especially important for the Engineering Manager or Tech Lead since they are held responsible for such risks in case of any outages within a team.
Do you know of any stories where Bus Factor created a lot of problems? Let me know in the comments below or drop an email.
Share this article
Copy and share this article: https://www.narendravardi.com/bus-factor/
If you liked this article, you might also like reading the following.
- Future Software Engineers, read this article before and during your placements.
- Five mistakes people make while building a resume for the first time
- Learnings from two years of Work From Home
❤️ Enjoyed this article?
Forward to a friend and let them know where they can subscribe (hint: it's here).
Anything else? Comment below to say hello!