It Didn't Suddenly Break
Most organizational failures aren't events. They're accumulations that finally became visible. The post-mortem found the break. It didn't find the breaking.
Post 1 — Tuesday, April 7
[It Didn’t Suddenly Break - Part 1 of 6]
The project collapsed. The team was blindsided. Leadership called an emergency meeting to figure out what went wrong.
They found it. A missed deadline here. A dropped handoff there. The moment things went sideways, identified and documented.
The timeline made sense. The cause was clear. The team left the meeting knowing what had happened.
Six months later, a different project. Same collapse. Same emergency meeting. Same findings, different names on the document.
Here’s what nobody said in either meeting: the project didn’t fail the week it failed. It was failing for months before that. The missed deadline wasn’t the cause — it was the announcement. The dropped handoff wasn’t where things went wrong — it was where things finally became visible.
The post-mortem found the event. It didn’t find what generated the event.
This is the standard failure mode in organizational diagnosis. Not incompetence, not bad faith — just the way human attention works. We are exquisitely tuned to notice when things break. We are poorly tuned to notice the breaking.
Events are loud. Accumulations are quiet. Your detection systems are optimized for loud.
Which means every organization has things that are currently, quietly, breaking — that won’t appear in any report until they finish.
The question isn’t whether you missed the crisis. You didn’t. You saw it the moment it became a crisis.
The question is what you were watching before that.
It didn’t suddenly break. It was breaking all along.
Post 2 — Wednesday, April 8
[It Didn’t Suddenly Break - Part 2 of 6]
Someone in a meeting said: we need to manage X. Get me data.
X was hard to measure. So the team found something adjacent — easier to pull, close enough to proxy for the real thing. They built a dashboard. The boss looked at it and said: excellent. Now we can really stay on top of things.
Here’s what happened in that gap between “adjacent” and “excellent”: the proxy became X. Nobody decided that. It just happened — because the dashboard existed, the reviews referenced it, the reports cited it, and the original question quietly stopped being the thing anyone was actually tracking.
You can’t manage what you can’t measure. True. But just because you can measure it doesn’t mean it’s the thing worth managing.
This is how organizations end up with monitoring systems that are working perfectly and still get blindsided. The systems are aimed at what was measurable, not what was consequential. And over time, through ordinary use, the substitution gets forgotten. The proxy becomes the thing. The dashboard becomes the reality.
The failure that blindsides you doesn’t cross a threshold suddenly. It accumulates gradually in the gap between what you’re measuring and what actually matters. And gradual accumulation in that gap doesn’t trip event-detection systems. It just looks like background noise — until the day it doesn’t.
Think about the last major failure your organization experienced. Go back six months before the event. Twelve months. The signals were there — not as events, but as slow drift. Metrics that were technically green but trending. Relationships that were functional but fraying. Processes that were compliant but increasingly brittle.
Nothing had broken. Everything was breaking.
The monitoring caught the break. It was never designed to catch the breaking.
The gap between what you measure and what matters is where most organizational risk actually lives.
Post 3 — Thursday, April 9
[It Didn’t Suddenly Break - Part 3 of 6]
Picture a slope loaded with snow over a long winter. Each snowfall adds a little more. Most days, nothing happens. The slope looks stable. The measurements look normal. And then one ordinary day, one ordinary snowfall, and the whole thing moves.
The avalanche didn’t start that day. It started the first time snow fell and didn’t fully consolidate. It built through every storm that winter. The triggering snowfall wasn’t special — it was just last.
This is how most organizational failures actually work. Not a single cause. An accumulation that finally exceeded the capacity of the system to absorb it. The trigger gets named in the post-mortem because it was last and visible. The loading that made the trigger consequential goes unnamed because it was slow and ordinary.
Here’s the diagnostic. Three questions.
- Does your organization have a formal process for reviewing events — incidents, failures, near-misses — but no equivalent process for monitoring slow drift in leading indicators?
- When something fails, does the post-mortem reliably identify a proximate cause — a decision, a person, a moment — without examining what made that cause consequential?
- For the dashboards your organization runs regularly: do you know what the original question was, and whether the data currently being tracked still answers it?
If two of those are yes, your slope is loaded.
You’re not missing events. Your detection systems are fine. The substitutions they’re running on are the problem.
What is accumulating in your organization right now that everyone is managing as background noise — and how long has it been accumulating?
Post 4 — Tuesday, April 14
[It Didn’t Suddenly Break - Part 4 of 6]
Organizations aren’t blind to accumulation by accident. They’re blind to it by design.
Not deliberate design. Structural design. The way incentives, reporting cycles, and accountability mechanisms have been built — over years, through ordinary decisions that each seemed reasonable — produces an organization that is systematically better at responding to events than monitoring load.
Consider how performance is measured. Quarterly cycles. Annual reviews. Metrics that reset. The structure rewards response — you get credit for fixing the thing that broke. It doesn’t reward the slower work of monitoring what’s bending before it breaks. That work is invisible until it succeeds, and invisible in a different way when it fails.
Consider how escalation works. Problems surface when they cross a threshold — when someone raises a flag, when a metric goes red, when a deadline breaks. Below the threshold, the problem doesn’t exist organizationally, even if it exists operationally. The system is designed to become aware of things at the moment they’re already critical.
Consider where senior attention goes. Leadership time is finite and pulled toward events — the crisis that needs a decision, the opportunity that needs a response. The slow variables get managed at lower levels, if they get managed at all, by people without the standing to escalate something that hasn’t broken yet.
None of this is a failure of intention. It’s a consequence of how organizations were built to operate in environments where events were the primary signal.
The environment has that part right. Events are real signals. The problem is they’re late ones.
By the time the event is visible, the loading that made it inevitable is already complete.
The structure that was supposed to help you respond faster is part of what kept you from seeing sooner.
Post 5 — Wednesday, April 15
[It Didn’t Suddenly Break - Part 5 of 6]
So what does it actually look like to watch the breaking instead of waiting for the break?
It doesn’t look like more dashboards. More dashboards aimed at the same things just give you a faster view of the same blind spot.
It looks like changing what you treat as signal.
The metrics your organization tracks are almost certainly outcome metrics — things that tell you what happened. What you need alongside them are load metrics — things that tell you what’s accumulating. Not whether the system failed, but whether the system is under increasing strain. Not whether the relationship broke, but whether it’s been fraying. Not whether the process produced an error, but whether the error rate has been drifting for six months while staying technically within tolerance.
Correlation isn’t causality. But it’s a great place to start looking. You can’t prove the drift will produce a failure. You can notice it’s the kind of thing that tends to precede failures. That’s enough to warrant a closer look — and in most organizations, a closer look is exactly what the structure is designed to avoid.
The distinction between outcome and load sounds simple. It’s organizationally hard, because load metrics don’t produce clean events. They produce gradual trends that require interpretation, that reasonable people can read differently, that don’t generate the urgency that makes action easy to justify. You will have to make a decision based on something that hasn’t broken yet.
That’s the practice. Not waiting for the threshold crossing. Asking, on a regular cadence, what is currently bending that we have been treating as background noise.
Three questions worth building into whatever review process you run.
- What has been trending in the wrong direction for longer than one quarter?
- Where are we describing something as stable when what we mean is it hasn’t failed recently?
- What would we find if we looked closely at the thing we haven’t looked closely at in a while?
You already know how to respond to events. Every organization does.
The break was always downstream of the breaking. Start watching upstream.
Post 6 — Thursday, April 16
[It Didn’t Suddenly Break - Part 6 of 6]
Here is what the pattern has been showing you.
The event was real. The trigger was real. The moment things finally broke was exactly when it looked like things broke.
And the same failure will happen again — different project, different team, different name on the post-mortem — because the accumulation that made the trigger consequential is still the thing your organization doesn’t know how to see.
The post-mortem found the break. It didn’t find the breaking.
This is not a monitoring failure. It’s a framing failure. The question your organization knows how to ask is: what caused this? The question it needs to learn to ask is: what was this the result of? The first question has an event as its answer. The second has an accumulation — and accumulations can be interrupted, if you catch them while they’re still building.
The analyst who can read drift instead of events, who can ask what the system is loading rather than what the system produced, who can escalate something that hasn’t broken yet — that analyst is doing something most organizational structures are not built to support. They’re watching the slope while everyone else is waiting for the snow to move.
That’s the practice. Not better event detection. Earlier load monitoring. Not faster response. Longer anticipation.
The break was always downstream of the breaking. Start watching upstream. The flash flood is coming. Sooner or later.
>>>>> It didn't suddenly break. It was breaking all along. <<<<<