What does engineering velocity actually measure?

Engineering velocity should measure how quickly a team can move from a validated idea to real user impact. Story points, pull requests, and deployment counts can be useful signals, but they are not the outcome. The stronger measure is how quickly valuable change reaches users and whether that change works.

Are story points a good way to measure engineering performance?

Story points are useful for a team planning its own capacity, but they are a poor performance metric. They are relative to each team and easy to distort once they become a target. Using story points to compare teams usually measures estimation culture rather than delivery capability.

Which metrics are better than sprint velocity?

Lead time for changes, deployment frequency, change failure rate, recovery time, and cycle time by stage are better indicators of delivery health. Together they show whether work flows smoothly from commit to production and whether speed is being achieved without sacrificing stability.

Why does cloud infrastructure affect engineering velocity?

Cloud infrastructure affects velocity because it shapes how easily teams can build, test, deploy, and operate software. Well-designed CI/CD, infrastructure as code, reproducible environments, and automated deployment paths reduce wait time and manual coordination across the value stream.

How should organisations improve engineering velocity?

The best starting point is mapping the full value stream from idea to user impact, then identifying where work waits. Most bottlenecks are outside coding itself: backlog delay, review queues, deployment friction, feature flag delays, or missing feedback loops. Fixing one bottleneck at a time is usually more effective than setting higher velocity targets.

Engineering Velocity: What It Measures and Where Organisations Go Wrong

Ask ten engineering leaders what they mean by velocity and you will get ten different answers. Some will tell you it is story points per sprint. Others will say it is deployment frequency. A few will mention lead time. The honest ones will admit they are not entirely sure, but they are tracking something in a dashboard and calling it velocity because that is what the business asked for.

This matters because the metric you use shapes the behaviour you get. Track the wrong thing, or track the right thing in the wrong context, and you will build a team that is very good at hitting a number while the actual work of delivering value slows down. This post is about what engineering velocity actually measures, why so many organisations get it wrong, and what to do instead.

The Difference Between Output and Outcomes

The most common mistake in measuring engineering velocity is conflating output with outcomes. Output is story points completed, pull requests merged, features shipped. Outcomes are things that matter to users and to the business: problems solved, time saved, revenue generated, churn reduced.

A team can produce enormous output and deliver almost no outcomes. Features nobody uses, refactors that do not unblock anything, dashboards that nobody looks at. The inverse is also true: a small team working carefully on a well-scoped problem can deliver disproportionate business value with modest output.

Engineering velocity, properly understood, is a measure of how quickly a team moves from idea to real user impact. Not how many story points they completed. Not how often they deploy. How quickly a good idea reaches the people it is supposed to help, and whether it actually helps them.

That framing changes what you measure and, more importantly, what you fix when things are slow.

What the DORA Research Actually Says

The most rigorous body of research on software delivery performance comes from the DORA team at Google, whose annual State of DevOps report has tracked what separates high-performing engineering organisations from struggling ones for over a decade.

Their framework centres on four metrics:

Deployment frequency measures how often an organisation successfully releases to production. Elite performers deploy multiple times per day. Low performers deploy less than once per month.

Lead time for changes measures the time from a code commit reaching production. This is probably the single most useful velocity indicator because it captures the entire pipeline, not just the development phase.

Change failure rate measures what percentage of deployments cause a production failure. High deployment frequency paired with a high failure rate is not velocity; it is noise.

Failed deployment recovery time measures how long it takes to restore service after a failure. Teams that can recover quickly are less afraid to ship, which feeds back into deployment frequency and lead time.

The key insight from the DORA research is that high-performing teams do not trade speed for stability. They achieve both simultaneously. The assumption that you have to slow down to maintain quality is, the research suggests, simply wrong. It is a property of poor engineering practices, not an inherent tension.

The Story Points Problem

Story points were never designed as a velocity metric. They were designed as a relative sizing tool to help teams plan how much work to take on in a sprint. The Scrum framework is explicit about this: velocity is for the team's own capacity planning, not for performance measurement.

What happened in practice is that the number escaped the team and ended up in executive dashboards. Once that happened, Goodhart's Law took over. When a measure becomes a target, it ceases to be a good measure. Teams under pressure to hit velocity targets inflate their estimates, take on easier work, or split stories into smaller pieces that look like progress without being any. The number goes up. The outcomes stay flat.

Scrum.org describes this clearly: velocity is a genuinely useful metric when used by the team for planning purposes, and a damaging one when used by people outside the team to rank or compare performance. Story points are a local currency. A team that estimates in units of five is not delivering five times more value than a team that estimates in units of one.

The more insidious version of this problem is that velocity targets can actually slow teams down over time. Teams that optimise for story point throughput tend to accumulate technical debt faster, because debt reduction work is invisible in point counts. The codebase gets harder to work in, build times get slower, and eventually what took a day to ship takes a week. The velocity number may still look fine right up until it collapses.

Where the Real Bottlenecks Hide

The assumption behind most velocity measurement is that the bottleneck is in the development phase: developers writing code. In practice, that is rarely where the time goes.

Map the full journey of a feature from idea to user impact and you typically find something like this:

The idea sits in a backlog for weeks before anyone prioritises it.
Once prioritised, it waits for a planning session.
Development takes a few days.
It sits in a review queue for two days.
It passes review and waits for a deploy slot.
The deploy goes out but feature flags keep it dark for another week.
It finally reaches users but nobody has instrumented it, so you do not know if it is working.

Development was three days. The whole journey was four weeks. No amount of sprint velocity optimisation will fix a process that is blocked at five different points outside the development phase.

This is precisely why lead time for changes, as defined by the DORA framework, is more useful than story point velocity. It forces you to look at the whole value stream, not just the part where engineers are writing code.

Four Common Mistakes Organisations Make

Measuring output instead of flow. Counting what gets done inside the development phase while ignoring wait times, handoff delays, and deployment friction. The work looks fast; the delivery is slow.

Comparing velocity across teams. Story points are not a standard unit. A velocity of 40 in one team is not comparable to a velocity of 40 in another team with different norms, different tooling, and different problem domains. Cross-team comparisons based on velocity data produce rankings that reflect estimation culture more than delivery capability.

Setting velocity as a target. Once leadership starts tracking velocity in quarterly reviews and tying it to team health ratings, the metric is compromised. Teams will find ways to hit the number. Whether that correlates with actually delivering better software faster is a separate question, and the answer is usually no.

Ignoring technical debt as a velocity drag. Technical debt is the most consistent long-term drag on engineering velocity, and it rarely shows up in sprint metrics until it is severe. A reasonable practice is allocating roughly 15 to 20 percent of sprint capacity to technical health work. Most organisations under pressure to deliver features deprioritise this until the codebase becomes genuinely painful to work in, at which point velocity drops and nobody is quite sure why.

Velocity in Cloud and Distributed Environments

The infrastructure a team works on shapes how fast they can deliver. This is not a secondary concern. It is central to velocity.

In cloud-native environments, particularly on AWS, the tooling available for CI/CD, infrastructure as code, automated testing, and deployment pipelines can remove entire categories of friction that slow teams down on-premises. When the infrastructure is well-designed, a code change can move from commit to production in minutes. When it is poorly designed, the same change can take days to navigate through manual steps, environment inconsistencies, and deployment processes that require coordination across multiple teams.

This is one of the reasons that cloud migration done well has a measurable impact on engineering velocity. It is not just about cost or scalability. The shift to cloud, when accompanied by good architectural decisions, removes the infrastructure bottlenecks that sit in the middle of the value stream. Poorly executed cloud migration can add them.

The organisations that see the biggest velocity gains from cloud work are those that treat infrastructure as a product that serves the engineering team. Fast, reliable pipelines. Reproducible environments. Infrastructure changes that can be reviewed and deployed with the same confidence as application code. If your cloud consultancy work is not thinking about developer experience as an output, you are leaving velocity improvements on the table.

What a Better Measurement Approach Looks Like

The goal is not to stop measuring. It is to measure things that are harder to game and more closely connected to what you care about.

Lead time from commit to production is the best single metric for development pipeline health. It is objective, it is hard to inflate, and it captures everything between writing code and reaching users.

Deployment frequency is a useful signal for how confident the team is in their delivery process. Teams that deploy rarely are usually doing so because deployment is painful or risky. Making it safe to deploy more often is good engineering practice, not just a metric to chase.

Change failure rate keeps deployment frequency honest. Deploying ten times a day with a 30 percent failure rate is not velocity; it is instability.

Cycle time by stage is more diagnostic than any single metric. Breaking down the value stream into stages (backlog, planning, development, review, deployment, user adoption) and measuring time in each stage tells you where to focus improvement effort. Most teams find that development is not the bottleneck.

Business outcomes are the ultimate check. Features shipped, user problems solved, revenue attributed to engineering work. These are hard to measure and easy to argue about, but including them prevents the situation where all the velocity metrics look good and the business is still not getting what it needs.

The Cultural Dimension

The 2024 DORA report highlighted something that technical metrics alone cannot capture: psychological safety is among the strongest predictors of software delivery performance. Teams where people feel safe to raise concerns, flag problems early, and be honest about estimates consistently outperform teams with better tooling but lower trust.

This has direct implications for how velocity is used. When velocity data is used punitively, to rank teams, identify underperformers, or justify headcount decisions, it damages exactly the conditions that produce good performance. Engineers optimise for looking good in the metric rather than doing the right thing. Problems get hidden. Estimates get inflated. The team that looks fastest may be the one that is managing data most carefully.

The organisations with the best engineering velocity tend to use metrics as diagnostic tools, not scorecards. They share the data with the team, invite the team to interpret it, and focus improvement effort on removing friction rather than applying pressure.

Connecting Velocity to Business Outcomes

Engineering velocity is not an end in itself. The reason it matters is that faster delivery of working software, to users who need it, is a genuine competitive advantage. It reduces the time between identifying a problem and solving it. It shortens the feedback loop between building something and learning whether it worked. It makes the organisation more responsive.

For businesses going through cloud migration or significant infrastructure change, velocity often dips before it improves. The investment in better tooling, better pipelines, and better engineering practices pays back over time, but the improvement is not always immediate. Understanding this cycle, and having the right metrics to track it, is part of what good cloud consultancy should help you manage.

If your engineering velocity is not where you need it, the answer is rarely to measure more things or set higher targets. It is usually to map the value stream honestly, find the actual bottlenecks, and fix one thing at a time. That is slower work than buying a dashboard, but it is the work that actually changes the number.

External references: