Making Bold Decisions as a Software Engineer
_Early in my career, I would've spent months patching the legacy system. It was a Frankenstein's monster of code, with patches on top of patches. Every time we pushed a new feature, a different part of the system would break.
But then I took a step back and asked: what could go wrong if we don't rewrite this monster? The answer was clear: the system would eventually collapse, and our entire product would be at a standstill. The fragility itself was a bigger risk than a full rewrite.
On the other side, I once worked on a codebase that was highly dependent on a package, which worked fine. Then a more modern package was released - promising, simpler to use, better performance, actively maintained. But it required a really hard refactor to adopt.
I was obsessed with replacing that package, and so I did. I spent weeks testing everything and changing huge parts of our codebase. Once it hit production, a major flow stopped working. Luckily, we reverted fast, but I was held back from continuing work on that project.
Both stories taught me the same lesson: Risk is part of every decision. The question is not whether there is risk, but how much, how to manage it, and what you gain from it.
One of the most significant change I've made in my career as a software engineer was learning how to make bold decisions, making stepping out of my comfort zone part of my comfort zone.
Why was this so significant?
- I used to obsess over every little detail that might go wrong. That often left me in analysis paralysis, or focusing on low-impact issues.
- Once I got more comfortable with taking bold moves, I also started asking: what could go wrong if I don't do this?
Now, before you think I'm out here YOLO'ing into production: nope 🤟
Engineers should be a little pessimistic, but not only. I still think about edgy race conditions, scaling issues, the "what happens in 2-3 years?" questions. But right after that, I balance it with a different set of questions:
- How certain am I about this failure actually happening?
- If it does fail, how bad is the impact?
- How easy will it be to monitor and maintain it in the long run?
- Complexity adds risk. How much complexity does it add if I try to prevent it upfront or add reactive mitigations mechanisms?
- What happens if I don't make this change?
- And finally: what do other engineers think? (ADR docs are gold here, probably worth a future post on that)
Think of risk as: Severity × Probability
When both severity and probability are high, I spend more time thinking. But if you find yourself constantly in "high risk" mode, that's a signal too. Are you exaggerating? Or is your system actually fragile? Fragility itself is a risk, one that often requires bold decisions to fix. (There are ways to measure it, like DORA metrics, another future post here 🤔)
Of course, this framework can itself become overwhelming, analysis paralysis 2.0 😵💫
But the point isn't to perfect it. The point is to look the hard decision in the eyes, break it down, and ask: is it too risky? No? Let's do this! Yes? Can you reduce the risk? Does it worth it?
It's the same principle as breaking down large tasks into subtasks, iterating in agile, or reasoning step by step in LLMs. Narrow your focus to make the big picture clearer.
What can you do to make bold moves less sketchy? You can set up good detection with alerts, dashboards, and monitors to catch issues early. You can take proactive steps like feature flags, rate limiting, or gradual rollouts to lower exposure. And you can build in reactive measures such as self-healing systems, rollback plans, and clear incident playbooks.
Taking risks are part of making progress, but you should make sure that your team can detect it quickly, contain the damage, and recover fast when things go wrong.
Developing confidence in your decisions is a skill that becomes second nature with practice. By approaching your next tasks with a little more consideration, you may find yourself moving more quickly, or you might uncover a clever method to improve future outcomes.
Is there something in your system you've always wanted to improve but felt hesitant to try? Maybe it's time to break the risk down into smaller steps and take that bold step, or clearly decide it's too risky 👀