Velocity Theater and the Cost of Technical Debt

Velocity Theater and the Cost of Technical Debt argues that most technical debt is not caused by careless engineers, but by sprint driven incentives that prioritize short term delivery over long term system health. In typical agile environments, refactoring and architectural improvements rarely win sprint capacity because they do not produce immediately visible output, which leads teams to make individually rational shortcuts that compound into systemic friction over time. While metrics such as burndown charts and velocity suggest healthy delivery, early warning signs like rising estimates, aging pull requests, and repeatedly reopened tickets signal accumulating debt beneath the surface. AI coding tools further amplify whatever practices already exist, accelerating both quality and debt depending on the team’s foundations. The solution is not simply paying down debt later, but redesigning processes to make debt visible, protect refactoring capacity, and coordinate action on delivery signals before compounding costs undermine the roadmap.

March 3, 2026

Velocity Theater and the Cost of Technical Debt

Most delivery problems do not announce themselves. The sprint board closes. The burndown chart looks fine. The retrospective notes "good delivery." And somewhere underneath all of it, the codebase quietly gets harder to work with every week.

The standard framing treats this as an engineering problem. But the engineers are not the ones setting the incentives. The sprint model is. And the sprint model, as most teams actually run it, creates conditions where cutting corners is the rational choice, sprint after sprint, until the compounding starts.

Most content on technical debt asks how to pay it down. That is the wrong starting question for a PM. The right question is whether anything in your current process stops accumulation in the first place. For most teams, the honest answer is no.

The Problem Agile Is Actually Creating

Technical debt is usually framed as an engineering problem. The codebase gets messy, shortcuts accumulate, someone eventually has to clean it up. That framing is accurate but incomplete. It locates the cause inside engineering decisions without examining the system that shapes those decisions.

The sprint is a pressure system. It creates a fixed time box, a committed scope, and a binary outcome: the story closes or it does not. That pressure is genuinely useful for focus. It is structurally hostile to the work that prevents debt, because refactoring, test coverage improvement, and architectural simplification almost never produce a shippable story within two weeks. They improve the system's capacity to produce future stories. They do not close the current one.

So they get deferred. Not because engineers are careless or lazy. Because the system's incentives point clearly in one direction and the engineers are responding rationally to them. The skip-the-refactor decision, made individually, is sensible. Made collectively, across twelve sprints, it produces a codebase that functions, ships, and quietly becomes more expensive to work with every week.

In most organisations, that cost does not appear as a line item. It appears as rising estimates, frustrated engineers, and a roadmap that keeps slipping despite everyone working hard.

The Sprint Pressure Trap

When the only two outcomes that matter are "shipped" or "not shipped," engineers make individually rational decisions that are collectively catastrophic.

They skip the refactor because the sprint ends Friday. They write the test for the next sprint. They copy-paste rather than abstract because abstraction takes two hours the sprint does not have. They leave the comment that says "TODO: fix this properly," and the entire team understands that it will not be fixed next sprint either. Each of these decisions is defensible in isolation. Together, over a quarter, they produce a codebase where every ticket costs more than it should and nobody can fully explain why.

The trap for PMs is that none of this is visible in the tooling. Story points close. The burndown chart is clean. The sprint retrospective notes "good delivery." What the retrospective does not capture is the workaround that made the story closeable, or the test coverage that got skipped to make the Friday deadline, or the module that now has three engineers afraid to touch it. The feedback loop that would surface this information is broken at the reporting layer, and by the time the debt becomes undeniable in estimates and timelines, it has usually been compounding for months.

This is where sprint pressure intersects with a deeper organisational problem: information about codebase health rarely travels upward. Engineers raise it in standups. It gets acknowledged. It does not get sprint capacity. Over time, engineers stop raising it, and the PM is left making roadmap commitments against a codebase whose real cost is invisible to them.

Why Refactoring Never Wins the Sprint

This plays out in some version of every sprint planning meeting in every team under delivery pressure. It almost always loses, and not because the team does not care.

A developer raises a refactoring need, usually attached to a story touching a known problem area. The estimate with the refactor is larger than the estimate without it. The sprint has committed scope. The stakeholder has a demo. The refactor gets logged, tagged, assigned a priority, and placed behind every story that will be committed at the next planning meeting. It never comes back.

There is no velocity credit for making the codebase easier to change. No burndown chart for debt reduction. No signal that identifies a module as a growing liability before the cost becomes undeniable. What does not get measured does not get prioritized, and what does not get prioritized accumulates.

Below a certain threshold, engineers absorb the friction. Workarounds exist. Delivery continues. Above that threshold, every feature becomes a mini-project, every bug fix introduces new risk, and the team's capacity to ship anything new is partially consumed by the cost of maintaining what already exists. That threshold arrives much earlier than most PMs expect. It is almost always visible in rising estimates and team behavior well before it shows up in any dashboard.

What AI Coding Tools Actually Amplify

This question surfaces constantly as AI coding tools become standard in engineering workflows. The direct answer is that it depends entirely on what the tools are amplifying.

AI coding assistants generate functional code quickly. They do not enforce a definition of done. They do not decline to write tightly coupled code because refactoring was not allocated sprint capacity. They do not push back on shortcuts because the sprint ends Friday and the stakeholder demo is tomorrow. In a high-debt, high-pressure environment, AI code quality tools accelerate both output and debt generation simultaneously, because the shortcuts being made are the problem, and tooling that helps make them faster is not the solution.

The teams seeing genuine quality improvements from tools like Copilot share a common characteristic: strong code review practices, active test coverage requirements, and deliberate refactoring capacity that is protected rather than perpetually deferred. In those environments, the tool reduces friction for work that is already being done well. It catches common patterns quickly, provides a consistent baseline of feedback regardless of reviewer availability, and frees senior engineers from routine issues so they can focus on architectural judgment. That is real value. The caveat is the same one that applies across the category: the tool amplifies what is already in place. The right question before adoption is whether the existing engineering practices are worth amplifying. If the answer is not clearly yes, tooling compounds the problem rather than addressing it.

You Probably Only Have One of These Three Things

Most teams have measurement or visibility. Almost none have coordination.

Measurement is about codebase health. Static analysis platforms like SonarQube surface debt as a quantifiable backlog: specific modules, specific issues, specific effort estimates. Debt that cannot be pointed to cannot be prioritized, and debt that cannot be prioritized accumulates by default. If engineering cannot show concretely where the debt is, making it visible is the prerequisite for everything else.

Visibility is about delivery health. Analytics platforms surface PR cycle time, review turnaround, deployment frequency, and lead time across the team. They are diagnostic: useful for understanding where bottlenecks are forming, which areas of the codebase are accumulating churn, where delivery is degrading before it becomes undeniable. What they do not do is act on what they surface. They show you that PR cycle time is increasing. They do not follow up with the reviewer who has not looked at the queue.

Coordination is about acting on signals. When a ticket has been sitting in "In Progress" with no commits for 48 hours, when a PR has been open past review thresholds with no activity, when a story is repeatedly reopened, something needs to close that loop. Not a dashboard. A direct follow-up to the person accountable, with context, and an escalation path if the work does not move. That is the layer most engineering teams do not have. It is also the layer where most delivery failures actually live.

What PMs Can Actually Do

The structural causes of technical debt in agile are as much product and process problems as they are engineering problems. Three things consistently move the needle, and none of them require engineering authority you may not have.

The first is making debt visible in planning. Debt that has a ticket can be groomed, estimated, and prioritized. Debt that lives in engineers' institutional memory gets deferred indefinitely, because it is invisible to anyone not directly working in the affected areas. If the team cannot point concretely to where the debt is, investing in static analysis tooling is the prerequisite for everything else. The right frame is not "technical hygiene." It is delivery risk.

The second is protecting refactoring capacity consistently. The mechanism matters less than the consistency. Some teams allocate a fixed percentage of each sprint to technical improvement. Others embed refactoring directly into the definition of done for stories touching known debt areas, so the work happens as part of delivery rather than as a separate negotiation that always loses. Teams that never get sprint capacity for technical improvement accumulate debt by design.

The third is auditing the definition of done in practice, not on paper. Ask engineers what "done" actually meant on the last three sprints. The gap between the written standard and the practiced standard is almost always where debt is entering the system.bel

The Debt Will Not Announce Itself

Sprints will keep closing. Velocity will keep looking fine. And underneath the metrics, the codebase will keep getting harder to work with, one deferred refactor at a time.

The teams that stay ahead of this do not have better engineers or more sophisticated tooling. They have a system that makes debt visible early, protects capacity to pay it down consistently, and refuses to let velocity theater substitute for delivery health.

The moment things usually break is not when the debt is created. It is when it starts compounding faster than the team can absorb.

If your system only works because someone is manually chasing these signals, it is not a system. It is a person.

Velocity Theater and the Hidden Cost of Technical Debt

Most delivery problems do not announce themselves. The sprint board closes. The burndown chart looks fine. The retrospective notes "good delivery." And somewhere underneath all of it, the codebase quietly gets harder to work with every week.

The standard framing treats this as an engineering problem. But the engineers are not the ones setting the incentives. The sprint model is. And the sprint model, as most teams actually run it, creates conditions where cutting corners is the rational choice, sprint after sprint, until the compounding starts.

Most content on technical debt asks how to pay it down. That is the wrong starting question for a PM. The right question is whether anything in your current process stops accumulation in the first place. For most teams, the honest answer is no.

The Problem Agile Is Actually Creating

Technical debt is usually framed as an engineering problem. The codebase gets messy, shortcuts accumulate, someone eventually has to clean it up. That framing is accurate but incomplete. It locates the cause inside engineering decisions without examining the system that shapes those decisions.

The sprint is a pressure system. It creates a fixed time box, a committed scope, and a binary outcome: the story closes or it does not. That pressure is genuinely useful for focus. It is structurally hostile to the work that prevents debt, because refactoring, test coverage improvement, and architectural simplification almost never produce a shippable story within two weeks. They improve the system's capacity to produce future stories. They do not close the current one.

So they get deferred. Not because engineers are careless or lazy. Because the system's incentives point clearly in one direction and the engineers are responding rationally to them. The skip-the-refactor decision, made individually, is sensible. Made collectively, across twelve sprints, it produces a codebase that functions, ships, and quietly becomes more expensive to work with every week.

In most organisations, that cost does not appear as a line item. It appears as rising estimates, frustrated engineers, and a roadmap that keeps slipping despite everyone working hard.

The Sprint Pressure Trap

When the only two outcomes that matter are "shipped" or "not shipped," engineers make individually rational decisions that are collectively catastrophic.

They skip the refactor because the sprint ends Friday. They write the test for the next sprint. They copy-paste rather than abstract because abstraction takes two hours the sprint does not have. They leave the comment that says "TODO: fix this properly," and the entire team understands that it will not be fixed next sprint either. Each of these decisions is defensible in isolation. Together, over a quarter, they produce a codebase where every ticket costs more than it should and nobody can fully explain why.

The trap for PMs is that none of this is visible in the tooling. Story points close. The burndown chart is clean. The sprint retrospective notes "good delivery." What the retrospective does not capture is the workaround that made the story closeable, or the test coverage that got skipped to make the Friday deadline, or the module that now has three engineers afraid to touch it. The feedback loop that would surface this information is broken at the reporting layer, and by the time the debt becomes undeniable in estimates and timelines, it has usually been compounding for months.

This is where sprint pressure intersects with a deeper organizational problem: information about codebase health rarely travels upward. Engineers raise it in standups. It gets acknowledged. It does not get sprint capacity. Over time, engineers stop raising it, and the PM is left making roadmap commitments against a codebase whose real cost is invisible to them.

Why Refactoring Never Wins the Sprint

This plays out in some version of every sprint planning meeting in every team under delivery pressure. It almost always loses, and not because the team does not care.

A developer raises a refactoring need, usually attached to a story touching a known problem area. The estimate with the refactor is larger than the estimate without it. The sprint has committed scope. The stakeholder has a demo. The refactor gets logged, tagged, assigned a priority, and placed behind every story that will be committed at the next planning meeting. It never comes back.

There is no velocity credit for making the codebase easier to change. No burndown chart for debt reduction. No signal that identifies a module as a growing liability before the cost becomes undeniable. What does not get measured does not get prioritized, and what does not get prioritized accumulates.

Below a certain threshold, engineers absorb the friction. Workarounds exist. Delivery continues. Above that threshold, every feature becomes a mini-project, every bug fix introduces new risk, and the team's capacity to ship anything new is partially consumed by the cost of maintaining what already exists. That threshold arrives much earlier than most PMs expect. It is almost always visible in rising estimates and team behavior well before it shows up in any dashboard.

What AI Coding Tools Actually Amplify

This question surfaces constantly as AI coding tools become standard in engineering workflows. The direct answer is that it depends entirely on what the tools are amplifying.

AI coding assistants generate functional code quickly. They do not enforce a definition of done. They do not decline to write tightly coupled code because refactoring was not allocated sprint capacity. They do not push back on shortcuts because the sprint ends Friday and the stakeholder demo is tomorrow. In a high-debt, high-pressure environment, AI code quality tools accelerate both output and debt generation simultaneously, because the shortcuts being made are the problem, and tooling that helps make them faster is not the solution.

The teams seeing genuine quality improvements from tools like Copilot share a common characteristic: strong code review practices, active test coverage requirements, and deliberate refactoring capacity that is protected rather than perpetually deferred. In those environments, the tool reduces friction for work that is already being done well. It catches common patterns quickly, provides a consistent baseline of feedback regardless of reviewer availability, and frees senior engineers from routine issues so they can focus on architectural judgment. That is real value. The caveat is the same one that applies across the category: the tool amplifies what is already in place. The right question before adoption is whether the existing engineering practices are worth amplifying. If the answer is not clearly yes, tooling compounds the problem rather than addressing it.

You Probably Only Have One of These Three Things

Most teams have measurement or visibility. Almost none have coordination.

Measurement is about codebase health. Static analysis platforms like SonarQube surface debt as a quantifiable backlog: specific modules, specific issues, specific effort estimates. Debt that cannot be pointed to cannot be prioritized, and debt that cannot be prioritized accumulates by default. If engineering cannot show concretely where the debt is, making it visible is the prerequisite for everything else.

Visibility is about delivery health. Analytics platforms surface PR cycle time, review turnaround, deployment frequency, and lead time across the team. They are diagnostic: useful for understanding where bottlenecks are forming, which areas of the codebase are accumulating churn, where delivery is degrading before it becomes undeniable. What they do not do is act on what they surface. They show you that PR cycle time is increasing. They do not follow up with the reviewer who has not looked at the queue.

Coordination is about acting on signals. When a ticket has been sitting in "In Progress" with no commits for 48 hours, when a PR has been open past review thresholds with no activity, when a story is repeatedly reopened, something needs to close that loop. Not a dashboard. A direct follow-up to the person accountable, with context, and an escalation path if the work does not move. That is the layer most engineering teams do not have. It is also the layer where most delivery failures actually live.

What PMs Can Actually Do

The structural causes of technical debt in agile are as much product and process problems as they are engineering problems. Three things consistently move the needle, and none of them require engineering authority you may not have.

The first is making debt visible in planning. Debt that has a ticket can be groomed, estimated, and prioritized. Debt that lives in engineers' institutional memory gets deferred indefinitely, because it is invisible to anyone not directly working in the affected areas. If the team cannot point concretely to where the debt is, investing in static analysis tooling is the prerequisite for everything else. The right frame is not "technical hygiene." It is a delivery risk.

The second is protecting refactoring capacity consistently. The mechanism matters less than the consistency. Some teams allocate a fixed percentage of each sprint to technical improvement. Others embed refactoring directly into the definition of done for stories touching known debt areas, so the work happens as part of delivery rather than as a separate negotiation that always loses. Teams that never get sprint capacity for technical improvement accumulate debt by design.

The third is auditing the definition of done in practice, not on paper. Ask engineers what "done" actually meant on the last three sprints. The gap between the written standard and the practiced standard is almost always where debt is entering the system.

How DevHawk Helps With This Model

DevHawk does not fix technical debt directly. No tool does. What it addresses is the coordination layer that makes debt invisible until it is too late.

When a ticket has been sitting in "In Progress" with no commits for two days, DevHawk surfaces it. When a PR is aging past your team's review threshold with no activity, DevHawk follows up with the owner in Slack. When a story is reopened for the third time, the pattern is flagged rather than absorbed into the noise of the sprint board.

The reason this matters for debt specifically: most early debt signals appear in delivery behavior before they appear in estimates. Rising PR review times, stories repeatedly reopened, modules where every ticket takes longer than it should. These are signals. Most teams do not have a system that acts on them. DevHawk is built for that layer, the space between "something is off" and "someone did something about it."

If ownership is unclear, automation amplifies confusion rather than resolving it. DevHawk works best when every ticket has a real owner and "done" means something consistent across the team. The tooling enforces the system. It does not substitute for one.

The Debt Will Not Announce Itself

Sprints will keep closing. Velocity will keep looking fine. And underneath the metrics, the codebase will keep getting harder to work with, one deferred refactor at a time.

The teams that stay ahead of this do not have better engineers or more sophisticated tooling. They have a system that makes debt visible early, protects capacity to pay it down consistently, and refuses to let velocity theater substitute for delivery health.

The moment things usually break is not when the debt is created. It is when it starts compounding faster than the team can absorb.

If your system only works because someone is manually chasing these signals, it is not a system. It is a person.

Frequently Asked Questions

What is technical debt, and why does it matter to PMs?

Technical debt refers to the accumulated cost of shortcuts, workarounds, and deferred improvement work in a codebase. For engineers, it shows up as friction: modules that are difficult to change, tests that are missing, code that is hard to reason about. For PMs, it shows up as something less precise but equally real: estimates that keep rising, features that take longer than they should, and a roadmap that keeps slipping despite the team working at full capacity. Technical debt is not an engineering problem the PM can ignore. It is a delivery risk problem that sits directly inside the PM's accountability.

How can a PM track technical debt without deep technical knowledge?

You do not need to read the code. You need to read the signals: rising estimates on stories that touch familiar areas of the codebase, tickets that are repeatedly reopened, engineers flagging concerns in standups that never make it into planning, and PR review times increasing across the team.

These are early indicators of technical debt, and they are visible in your delivery tooling without requiring you to understand the underlying architecture. The prerequisite is having tooling that surfaces these patterns rather than burying them in raw data.

Why does refactoring work keep losing in sprint planning?

Because sprint planning optimizes for visible output. A refactoring story closes no tickets a stakeholder cares about. It does not appear in a demo. It does not advance a roadmap item. It makes the next sprint go faster, and that benefit is invisible in the current planning meeting. The teams that protect refactoring capacity do not win this argument on merit. They win it by making the rules explicit before the meeting starts: a fixed percentage of capacity reserved, or a definition of done that embeds improvement work into stories touching known debt areas. Without a structural mechanism, the outcome is always the same.

Related reading: Technical Debt

Sources

MarTech, "Your Technical Debt Is Crushing Your Bottom Line." Citing Pegasystems research on the global cost of unmanaged technical debt. February 2026.

Share on socials: