The Velocity Paradox: Why AI Should Slow You Down

Every AI coding demo sells the same fantasy: less typing, more output, instant 10x.

But typing was never the bottleneck.[1]

Research and field data now point to a more interesting truth. AI usually accelerates the start of the work: boilerplate, first drafts, prototypes, repo navigation, and test scaffolding. Then it hands you a bill in a different currency: ambiguity, verification, review load, and integration work.[2][3]

That is the velocity paradox.

AI can make the inner loop faster while making good engineering feel slower.

And that is not a failure. It is often the whole point.

Drafting got faster. Shipping did not magically get easy.

By 2025, DORA found that 90% of technology professionals were using AI at work and more than 80% believed it improved their productivity.[2] But the same research also found that higher AI adoption can raise both throughput and instability unless teams have strong platforms, clear workflows, small batches, and good automated tests.[2][3]

In other words: AI helps you produce changes faster. It does not remove the need to know whether those changes are correct, necessary, safe, or aligned with user needs.

That distinction matters because software engineers only spend a minority of their time actually writing code. A 2025 Microsoft Research synthesis points to a long-running result: developers spend roughly 15–25% of their time coding, not the majority.[1] The rest is understanding, coordinating, reviewing, testing, debugging, and deciding.

AI is most useful when it attacks the typing tax. It is least impressive when you pretend the typing tax was the whole job.

The clarity tax is now immediate

Before AI, vague thinking often turned into vague systems slowly. You could hide fuzzy requirements inside a week of implementation and only discover the ambiguity in code review, QA, or production.

AI compresses that lag.

Give it a blurry task and it will confidently return a blurry solution in seconds. Give it missing constraints and it will fill the gaps with generic patterns, local optimizations, and whatever defaults happen to be statistically convenient.[4][5]

A weak brief looks like this:

"Add user import."

A usable brief looks more like this:

"Add a CSV user import endpoint that validates email format,
normalises phone numbers to E.164, de-duplicates by external_id,
returns row-level errors, and stays idempotent on retry."

That difference is not prompt polish. It is system design.

This is why prompting feels like architecture

The best recent AI practice is not really prompt engineering in the old slogan-heavy sense. It is closer to context engineering: deciding what the model should know, what constraints it should respect, what files or docs matter, what tools it can use, and what good output looks like.[3][5]

That maps surprisingly well onto software design.

Bad prompts look a lot like bad architectures: implicit assumptions, muddy boundaries, missing invariants, and too much stuffed into one place.

Good prompts look a lot like good architectures: explicit goals, clear interfaces, constrained scope, strong defaults, and known failure modes.

DORA now explicitly frames this as spec-driven development: refine user needs and constraints into a real spec first, then let AI operate inside that boundary.[3]

The uncomfortable evidence

This is the part the hype usually skips.

In METR’s randomized controlled trial on experienced open-source maintainers working in repositories they knew well, early-2025 AI tools made tasks take 19% longer on average.[6] That does not mean AI is useless. It means benchmark wins and anecdotal speedups do not automatically translate to mature, high-context, quality-sensitive engineering work.[6][7]

At the same time, the models are clearly getting better. METR’s February 2026 update says later data likely points to more uplift than the original study, but the estimate is muddy because many developers and tasks with the highest expected benefit increasingly selected out of the experiment; the most AI-positive contributors often refused to work without AI at all.[7]

So the honest version, as of today, is not “AI makes engineers slower” or “AI makes engineers faster.”

It is this:

AI makes some parts of software work dramatically faster, and it changes where the hard part lives.

The hard part moves upstream into definition and downstream into verification.

Benchmarks are rising. Production reality is still stubborn.

This is another reason your workflow should get more deliberate, not less.

On one hand, frontier coding benchmarks and agentic capability have improved quickly. METR’s time-horizon work argues that the length of software tasks frontier agents can complete with 50% reliability has been doubling at roughly a seven-month cadence, and newer systems are materially better at longer, tool-using tasks than the 2024 generation.[8] Modern tools are also increasingly agentic: repo-aware, tool-using, plan-execute-validate-revise systems rather than just smart autocomplete.[9]

On the other hand, benchmark strength still overstates everyday reliability. OpenAI’s SWE-Lancer benchmark, built from more than 1,400 real freelance software engineering tasks, found that frontier models were still unable to solve the majority of tasks.[10]

And once AI-written code lands in real repositories, the debt tail is real. A March 2026 large-scale study across more than 6,000 GitHub repositories found that every major coding assistant studied introduced issues in more than 15% of its commits, and 24.2% of tracked AI-introduced issues still survived in the latest revision.[11]

So yes, faster drafts. Also yes, lingering cleanup.

That is exactly why serious AI workflows need stronger gates than human-only workflows, not weaker ones.

The Rust parallel still works

Rust does not make you “slower” in the way the slogan implies. It forces certain categories of thinking to happen earlier, while the compiler makes the cost of fuzzy mental models immediate.

AI does something similar for product and architecture thinking.

It punishes vague intent early. It exposes missing constraints early. It makes hidden coupling obvious. It reveals whether you actually understand the job well enough to delegate it.

That friction is not overhead. It is feedback.

A better rhythm for AI-assisted engineering

The highest-leverage loop I see looks like this:

Start with intent, not implementation. Write down the user outcome, constraints, non-goals, and definition of done.
Define the shape of the solution. Name the modules, boundaries, data flow, and risky edges before you ask for code.
Give the model real context. Include the relevant files, conventions, schemas, examples, and failure cases. Do not make it guess.
Keep batches small. AI makes it easy to create giant diffs and harder to review them. Smaller changes keep feedback real.[2][3]
Push verification left. Ask the model to propose tests, invariants, rollback plans, migration risks, and review checklists before or alongside code.[3][11]
Fix the spec when the code goes wrong. Do not just patch the output. Improve the instructions, examples, and boundaries so the next pass starts from a better understanding.[4][5]

This feels slower the first few times because you are moving work from silent improvisation into explicit design.

That is exactly why it scales.

The real 10x

The real gain is not lines of code per minute.

It is fewer wrong turns. Fewer giant PRs. Fewer “technically correct, strategically wrong” features. Fewer regressions that pass locally and explode at integration time. Fewer weeks spent discovering that nobody agreed on what the feature actually was.

Speed is only meaningful if it survives contact with reality.

The teams that get the most out of AI are not the ones treating it like a faster keyboard. They are the ones using it as a forcing function for clearer thinking, sharper specs, tighter feedback loops, and stronger quality systems.[2][3][5]

The future belongs to the deliberate

By early 2026, the center of gravity has already started shifting from typing to orchestration: giving better context, steering agents, reviewing larger surfaces, and deciding what should be built in the first place.[4][5][9]

That means the advantage is moving toward engineers who can do four things well:

understand the user, define the system, constrain the machine, and verify the result.

AI is not making great engineers obsolete. It is making vagueness expensive.

And that is why the best AI workflow can feel slower.

Not because you are moving with less force.

Because you are finally paying for clarity upfront instead of with interest later.

References

Microsoft Research. New Future of Work Report 2025 (MSR-TR-2025-58, December 2025). https://www.microsoft.com/en-us/research/wp-content/uploads/2025/12/New-Future-Of-Work-Report-2025.pdf
Google Cloud / DORA. Announcing the 2025 DORA Report (September 23, 2025). https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report
DORA. Capabilities: User-centric focus (January 12, 2026). https://dora.dev/capabilities/user-centric-focus/ — see also Capabilities: Working in small batches (December 8, 2025), https://dora.dev/capabilities/working-in-small-batches/ ; Balancing AI tensions: Moving from AI adoption to effective SDLC use (March 10, 2026), https://dora.dev/insights/balancing-ai-tensions/ ; and Capabilities: AI-accessible internal data (January 12, 2026), https://dora.dev/capabilities/ai-accessible-internal-data/
Tang, Ningzhi, Chaoran Chen, Zihan Fang, Gelei Xu, Maria Dhakal, Yiyu Shi, Collin McMillan, Yu Huang, and Toby Jia-Jun Li. Programming by Chat: A Large-Scale Behavioral Analysis of 11,579 Real-World AI-Assisted IDE Sessions (arXiv:2604.00436, 2026). https://arxiv.org/abs/2604.00436
Anthropic. Effective context engineering for AI agents (September 2025). https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Becker, Joel, Nate Rush, Beth Barnes, David Rein, and collaborators at METR. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (July 10, 2025). https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
METR. We are Changing our Developer Productivity Experiment Design (February 24, 2026). https://metr.org/blog/2026-02-24-uplift-update/
METR. Measuring AI Ability to Complete Long Tasks (March 19, 2025). https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ — see also Task-Completion Time Horizons of Frontier AI Models. https://metr.org/time-horizons/
Borg, Markus, Dave Hewett, Nadim Hagatulah, Noric Couderc, Emma Söderberg, Donald Graham, Uttam Kini, and Dave Farley. Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability (arXiv:2507.00788, 2025). https://arxiv.org/abs/2507.00788 — see also the survey AI Agentic Programming: A Survey of Techniques and Applications (arXiv:2508.11126, 2025). https://arxiv.org/abs/2508.11126
OpenAI. Introducing the SWE-Lancer benchmark (February 18, 2025). https://openai.com/index/swe-lancer/
Liu, Yue, Ratnadira Widyasari, Yanjie Zhao, Ivana Clairine Irsan, and David Lo. Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild (arXiv:2603.28592, 2026). https://arxiv.org/abs/2603.28592