“Our engineers are excellent problem-solvers.”
When things break, the team fixes them fast. We value engineers who can debug under pressure and keep systems running. Our incident response is strong.
They’re not solving problems. They’re maintaining workarounds that became permanent.
That script someone wrote at 2am during an outage three years ago? It’s still running in production. It was never documented. It was never reviewed. But if you remove it, the billing system breaks. Your best engineer spends 30% of her time maintaining things that were never supposed to exist. She’s not engineering. She’s preserving accidents.
If your production environment depends on code nobody remembers writing, the system is running on artifacts, not architecture.
“We make careful technical decisions.”
Architecture decisions go through review. RFCs are discussed. The team values consensus and careful evaluation before committing.
The RFC has been open for four months. The team already built it their own way.
The architecture review board met three times. Each time they requested more analysis. Meanwhile the team that needed the decision couldn’t wait — they had a launch date. So they built it with what they had. The RFC is still technically open. The decision was made on the ground six weeks ago by people who couldn’t afford to wait for a process that couldn’t afford to commit.
When the review process takes longer than the build, decisions get made without it — and the process never notices.
“Our microservices architecture enables independent teams.”
Teams deploy independently. Services are decoupled. Each team owns their domain. The architecture was designed for autonomy and speed.
Every team is independent. Nothing works together.
Team A ships a breaking change. Team B finds out in production. The contract between services was documented once and never updated. The integration tests cover the happy path. The failure path — the one the customer actually hits — lives in the gap between two teams who each assume the other one handles it. Decoupling gave you speed. It also gave you seams nobody owns.
When services are decoupled but failure modes aren’t, independence is an illusion.
“Our codebase is well-documented.”
We have READMEs. We have design docs. We do onboarding for new engineers. The institutional knowledge is accessible.
The code is documented. The decisions behind it aren’t.
There’s a function that handles a very specific edge case. The comment says “DO NOT REMOVE.” Nobody knows why. The person who wrote it left two years ago. The Jira ticket it references was deleted in a cleanup. Removing it might break nothing. Removing it might break everything. So it stays. Your codebase is a library of decisions made by people who are gone, for reasons that are lost.
When the comment says “DO NOT REMOVE” and nobody knows why, the system is governed by fear of the unknown.
“We have a strong on-call culture.”
Engineers take ownership of their services. On-call rotations are fair. Incidents are learning opportunities. The team takes reliability seriously.
Three people carry the pager for everything that matters.
The rotation says eight engineers. In practice, when something actually breaks, the same three get called — because they’re the only ones who know the system well enough to fix it under pressure. The other five can follow the runbook. These three wrote the runbook. The on-call rotation is fair on paper. The cognitive load is not distributed. And those three are getting tired.
When the on-call rotation is eight people but the real rotation is three, the system is running on individuals, not infrastructure.
Every lens sees the same system. Shared language is how the system starts to learn.
These aren’t failures of people. They’re the physics of organizations operating at scale and speed.