spec-driven development

Spec-Driven Development with AI Coding Agents (2026)

A 2026 guide to spec-driven development with AI coding agents: evidence-grounded specs, reusable team skills, and the context layer that stops agent drift.

AI coding agents can ship a working feature in minutes. The problem in 2026 isn't generation speed — it's drift: confident, plausible code that quietly solves the wrong problem because nobody grounded the work in a real specification. The fix is spec-driven development with AI agents, where an evidence-grounded spec drives the implementation and reusable team conventions keep every agent consistent. This guide explains how that workflow works in practice, and how ZeroShot (the bb CLI at tryzeroshot.com) acts as the context layer that grounds Claude Code, Cursor, Codex, Copilot, and other agents in a shared spec, team skills, and customer evidence — instead of one-off prompts.

The Problem: Agents Code Fast, Then Drift

The bottleneck with AI coding agents is no longer how fast they write code — it's the review-and-rework loop that follows when output drifts from intent. Speed without a spec produces confident, wrong code.

A vague prompt produces a plausible implementation. The agent "finishes" a feature, the diff looks clean, and the tests pass — but it solves the wrong user problem because the prompt was never grounded in actual requirements. The real cost lands in review and rework, not in generation.

The data backs this up:

A 2025 METR randomized controlled trial found experienced open-source developers were roughly 19% slower on familiar tasks when using AI tools — despite believing they were faster. The hidden cost was review and rework.
Google's 2024 DORA report found that a 25% increase in AI adoption was associated with measurable decreases in delivery throughput and stability for some teams, attributed to review burden and trust calibration.
Stack Overflow's 2024 survey found ~76% of developers using or planning to use AI tools, but only ~43% trusted the accuracy of the output.

Drift is the new technical debt. When intent lives only in someone's head or a Slack thread, every agent run re-litigates the same requirements. ZeroShot exists to close that gap: it's the context layer that grounds agents in a shared spec and your team's conventions, so the agent builds against intent instead of guessing at it.

What Spec-Driven Development With Agents Actually Means

Spec-driven development with AI agents is a workflow where an evidence-grounded specification — not a disposable prompt — drives implementation, and reusable team skills keep results consistent across agents and engineers.

It helps to contrast three models:

Prompt-driven (improvisational): You optimize a single throwaway instruction. You get one good — or unlucky — result that nobody can reproduce.
Test-driven (validation-first): You write tests, then code to pass them. Tests confirm correctness but can't express intent or conventions.
Spec-driven (intent-first): You capture the why, the boundaries, and the evidence first, then implement against them. Tests become one form of acceptance criteria inside the spec.

A complete agent spec contains five components:

Intent — the why behind the work.
Constraints — the boundaries the implementation must respect.
Acceptance criteria — the definition of done.
Conventions to follow — your team's actual norms for structure, error handling, and testing.
Links to evidence — the customer signal or requirement that motivated it.

The crucial insight: the spec is the durable artifact; the prompt is disposable. Specs are reviewable, versionable, and shareable across teammates and agents. This matters more with agents than with humans because agents carry no implicit team context between sessions. What a senior engineer "just knows," an agent must be told explicitly every time — unless that knowledge is persisted in a shared spec and skills layer.

The Spec-Driven Workflow, Step by Step

The spec-driven workflow follows five repeatable steps that work regardless of which AI coding agent your team uses. Here's how it runs end to end.

Step 1 — Write or generate the spec from real requirements

Start from evidence, not a hunch. Capture intent, constraints, and acceptance criteria, and link them to the customer signal that motivated the work. Guesswork in produces guesswork out.

Step 2 — Encode team conventions as reusable skills

Make your file structure, error-handling patterns, and testing norms explicit and reusable, so every agent follows the same playbook instead of relying on each engineer's discipline.

Step 3 — Let the agent implement against the spec

Run the agent with the spec and skills loaded as context. The agent now builds against intent and conventions, not a vague instruction.

Step 4 — Review against the spec, not just the diff

A diff review answers "is this code correct?" A spec review answers "does this solve the right problem?" Agents make the first question cheaper, which makes the second question more important. Check the implementation against the acceptance criteria.

Step 5 — Persist the session

Save the session so the next teammate — or the next agent — resumes with full context instead of starting cold. With ZeroShot, every chat, file edit, and tool call is saved and indexed across repo, branch, PR, and commit.

This workflow is intentionally agent-agnostic. It works whether your team runs Claude Code, Cursor, Codex, GitHub Copilot, Gemini CLI, Windsurf, or Amazon Q — which matters because by 2026 most enterprise engineering orgs run more than one agent across teams.

Encoding the Spec Process as Reusable Skills

Skills make the spec-driven process repeatable instead of dependent on each engineer's discipline. BuildBetter's open-source skill packs — BB-Skills on GitHub — turn the workflow above into commands any agent can run.

Three skills anchor the spec workflow:

/bb-specify — turns requirements and customer evidence into a structured spec the agent can implement against.
/bb-plan — breaks the spec into an implementation plan with your conventions applied before any code is written.
/bb-review — reviews the resulting PR against both the spec and your team's actual conventions.

The pack also includes /bb-tasks, /bb-implement, /bb-clarify, /bb-analyze, /bb-checklist, and /bb-constitution, plus testing skills like /trust-but-verify, /app-navigator, and /generate-tests for Playwright.

BB-Skills extend the AGENTS.md standard — the de facto open standard that emerged in 2025–2026 for giving coding agents project context — with composable, conditional packs that load only when relevant. That keeps the agent focused and avoids context bloat.

Treat team conventions as code. Encoding error-handling patterns, file structure, and testing norms as reusable skills eliminates per-engineer config drift and makes onboarding near-instant.

Because skills carry your team's conventions, the same standard applies across every agent and every engineer. There's no per-person config drift, and the skills are free and open source — adopt, extend, and contribute back at github.com/buildbetter-app/BB-Skills.

Grounding Specs in Customer Evidence

A spec is only as good as the requirements behind it — guesswork in, guesswork out. The hardest part of spec-driven development isn't the structure; it's making sure the spec reflects what customers actually asked for.

This is where ZeroShot does something most context tools cannot. With an optional BuildBetter API key, it pulls real customer signals from BuildBetter.ai directly into specs and PR reviews. Instead of building from a product manager's recollection of a feature request, the agent builds from the actual evidence.

That closes the full loop:

Customer evidence — real feedback and signal from users.
Spec — intent and acceptance criteria grounded in that evidence.
Agent implementation — code built against the grounded spec.
Review — the PR checked against both the spec and the original evidence.

The trust gap that Stack Overflow measured — 76% adoption but only 43% trust — narrows when reviewers can trace a feature back to the requirement that motivated it. Evidence-grounded specs and spec-vs-implementation review are exactly the mechanisms that calibrate that trust.

This evidence layer is unique to ZeroShot. Memory-focused context tools encode session history and conventions, but they don't connect specs to customer signal — so teams still risk building the wrong thing faster.

Where ZeroShot Fits: The Context Layer Under Your Agents

ZeroShot is not another coding agent — it's the context layer that sits underneath the agents your team already uses. It supplies the three things agents lack on their own.

Cross-agent, cross-teammate session memory. Every session is saved and indexed across repo, branch, PR, and commit. Six months later, the agent still knows who owned a change, why it was structured that way, and what customer signal drove it.
Team-conventional skills. BB-Skills carry your team's playbook into every agent, so /bb-review and /bb-specify apply your actual conventions everywhere.
Customer-evidence grounding. Signals from BuildBetter.ai flow into specs and reviews so you ship what users asked for.

No one else combines all three. A defining capability is cross-teammate session resume: bb agent-sessions resume picks up any teammate's session on your machine, in any agent, so spec context survives handoffs and onboarding instead of being reconstructed from scratch.

ZeroShot is privacy-first — no data leaves your repo without consent — and open source, so there's no vendor lock-in. It works across Claude Code, Cursor, Codex, GitHub Copilot, Gemini CLI, Windsurf, and Amazon Q, and it's used at scale by Brex, Rappi, PostHog, AppFolio, Clay, Lufthansa, Procore, and Macmillan.

How Spec-Driven Tooling Compares

The honest framing: AI coding agents excel at generation. ZeroShot adds the shared spec, skills, and evidence layer they lack. The table below compares across the dimensions that matter for spec-driven development at team scale.

Tool	Works across multiple agents	Encodes team conventions as skills	Cross-teammate session memory	Customer-evidence-aware	Open source
ZeroShot (ZeroShot)	Yes — Claude Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, Amazon Q	Yes — BB-Skills	Yes — resume any teammate's session	Yes — from BuildBetter.ai	Yes
Cursor / Cursor for Teams	No — single agent	Partial (rules)	Limited	No	No
Claude Code	No — single agent	Partial (CLAUDE.md)	No	No	No
Devin	No — autonomous agent	Limited	No	No	No
Cody	Partial	Limited	No	No	Partial
Augment	Partial	Limited	Memory-focused	No	No
ContextPool	Partial	No	Memory-focused	No	Varies
Graphiti	Partial	No	Memory-focused	No	Yes

The two differentiators to highlight: customer-evidence-aware specs and cross-agent portability. Single-agent tools can't solve cross-agent fragmentation, and memory-focused tools don't connect specs to customer signal. ZeroShot does both, which is why it sits under the agents rather than competing with them.

Putting It Into Practice on a Real Team

You don't need to overhaul your stack to adopt spec-driven development — start with one workflow and let the wins compound.

Start with one feature

Adopt /bb-specify and /bb-plan for a single upcoming feature. Measure the rework reduction: how many review cycles did it take to converge versus a comparable prompt-driven feature?

Encode your highest-friction conventions first

Pick the two or three conventions that cause the most review churn — error handling, file structure, testing norms — and encode them as skills. Expand from there. Don't try to capture everything at once.

Make spec-vs-implementation review a default gate

Add /bb-review as a gate before merge. The question shifts from "is the code correct?" to "does this satisfy the acceptance criteria and the original customer evidence?"

Use session resume to fix onboarding and handoff

The most common origin story for adopting a context layer is onboarding and handoff pain — a new hire or a teammate inheriting work who has to reconstruct context a previous agent session already had. bb agent-sessions resume eliminates that cold start.

Treat specs as living artifacts

Specs improve as customer evidence accumulates. A spec written today gets sharper as BuildBetter.ai surfaces new signal, and your skills get sharper as your conventions mature. The compounding is the point.

Frequently Asked Questions

What is spec-driven development with AI coding agents?

It's a workflow where an evidence-grounded specification — capturing intent, constraints, acceptance criteria, and team conventions — drives the AI agent's implementation, and reusable team skills keep results consistent across agents and engineers. The spec is the durable artifact; the prompt is disposable.

How is spec-driven development different from prompt engineering?

Prompt engineering optimizes a single, throwaway instruction. Spec-driven development creates a durable, reviewable, version-controlled artifact grounded in real requirements. Prompts get you one good (or unlucky) result; specs make good results repeatable and reviewable across teammates and agents.

How is it different from test-driven development (TDD)?

TDD is validation-first: write tests, then code to pass them. SDD is intent-and-requirements-first: capture the why, the constraints, and the evidence, then implement. Tests are one form of acceptance criteria inside a spec, but a spec also encodes intent and conventions that tests alone can't express.

Does ZeroShot replace Cursor, Claude Code, or Copilot?

No. ZeroShot is a context layer that sits underneath the agents you already use. It supplies the shared spec, team conventions (skills), and customer evidence those agents lack — and it works across Claude Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, and Amazon Q.

What are /bb-specify and /bb-plan?

They're open-source skills, part of BB-Skills on GitHub. /bb-specify turns requirements and customer evidence into a structured spec the agent can implement against. /bb-plan breaks that spec into an implementation plan with your team's conventions applied before any code is written. /bb-review then checks the resulting PR against the spec and conventions.

How does customer evidence get into the spec?

ZeroShot pulls signals from BuildBetter.ai — real customer feedback — into specs and PR reviews, so teams build what users actually asked for. This closes the loop from customer evidence to spec to implementation to review.

Does it work across multiple agents and teammates?

Yes. Sessions and skills are portable across agents and resumable across teammates. bb agent-sessions resume picks up any teammate's session on your machine, in any agent, so spec context survives handoffs.

Ship at the Speed of Insight

Spec-driven development turns AI coding agents from fast-but-drifting into fast-and-grounded. The spec carries intent, the skills carry your conventions, and customer evidence keeps you building the right thing. ZeroShot is the open-source context layer that makes all three portable across every agent and every teammate.

Ship at the speed of insight. Install ZeroShot.